Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

tokyotech-llm

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

tokyotech-llm 's collections 16

GPT-OSS-Swallow-v0.1

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1

Text Generation • 21B • Updated Feb 20 • 3.09k • 20
tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1

Text Generation • 117B • Updated Feb 20 • 12.2k • 15
tokyotech-llm/GPT-OSS-Swallow-20B-SFT-v0.1

Text Generation • 21B • Updated Feb 20 • 437 • 5
tokyotech-llm/GPT-OSS-Swallow-120B-SFT-v0.1

Text Generation • 117B • Updated Feb 20 • 191 • 2

Apache-2.0 Open High Quality Math Corpus

tokyotech-llm/swallow-math-v2

Viewer • Updated Nov 6, 2025 • 17.4M • 21k • 31

Llama-3.1-Swallow-v0.5

tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 402 • 9
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 2.17k • • 19

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-code

Viewer • Updated Mar 1 • 129M • 1.36k • 65
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500

Updated Jul 4, 2025 • 5
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000

8B • Updated Jul 4, 2025 • 2
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500

8B • Updated Jul 4, 2025 • 3

Llama-3.3-Swallow

tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4

Text Generation • 71B • Updated Jul 1, 2025 • 243 • • 13
tokyotech-llm/Llama-3.3-Swallow-70B-v0.4

Text Generation • 71B • Updated May 31, 2025 • 47 • 4
tokyotech-llm/edu-classifier

Text Classification • Updated Jan 30, 2025 • 274 • 13

Llama-3-Swallow

tokyotech-llm/Llama-3-Swallow-8B-v0.1

Text Generation • Updated Oct 8, 2024 • 1.07k • • 12
tokyotech-llm/Llama-3-Swallow-70B-v0.1

Text Generation • Updated Oct 8, 2024 • 16 • • 6
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1

Text Generation • 8B • Updated Oct 8, 2024 • 9.27k • • 21
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1

Text Generation • 71B • Updated Oct 8, 2024 • 100 • • 7

Swallow-instruct

Swallow instruction tuning models

tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 392 • 44
tokyotech-llm/Swallow-13b-instruct-v0.1

Text Generation • 13B • Updated Oct 8, 2024 • 54 • 1
tokyotech-llm/Swallow-70b-instruct-v0.1

Text Generation • 69B • Updated Oct 8, 2024 • 10
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 103 • 4

Swallow MX(Mixtral) models

tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 282 • 29

Qwen3-Swallow-v0.2

tokyotech-llm/Qwen3-Swallow-8B-RL-v0.2

Text Generation • 8B • Updated Feb 23 • 2.3k • • 9
tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2

Text Generation • 31B • Updated Feb 23 • 250 • 7
tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2

Text Generation • 33B • Updated Feb 23 • 2.12k • • 1
tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2

Text Generation • 8B • Updated Feb 23 • 1.44k • • 5

Apache-2.0 Open High Quality Code Corpus

tokyotech-llm/swallow-code-v2

Viewer • Updated Nov 8, 2025 • 147M • 76.9k • 37

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-math

Viewer • Updated Mar 1 • 4.33M • 1.24k • 47
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500

8B • Updated May 7, 2025 • 1
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000

8B • Updated May 7, 2025 • 6
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500

8B • Updated May 7, 2025 • 5

Gemma-2-Swallow

tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1

Text Generation • 27B • Updated May 18, 2025 • 71 • 1
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1

Text Generation • Updated May 18, 2025 • 6.5k • • 1
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1

Text Generation • Updated May 18, 2025 • 20 •
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1

Text Generation • Updated May 18, 2025 • 128 • • 4

Llama-3.1-Swallow

tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 2.17k • • 19
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 402 • 9
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3

Text Generation • 71B • Updated Apr 2, 2025 • 839 • • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

Text Generation • 8B • Updated Apr 2, 2025 • 6.72k • • 24

Continual Pre-Training from Llama 2

tokyotech-llm/Swallow-7b-hf

Text Generation • 7B • Updated Oct 8, 2024 • 2.26k • 17
tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 392 • 44
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 103 • 4
tokyotech-llm/Swallow-7b-plus-hf

Text Generation • Updated Oct 8, 2024 • 8 • 8

Swallow MS/MX (Mistral/Mixtral) models

tokyotech-llm/Swallow-MS-7b-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 34 • 29
tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 311 • 14
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 282 • 29

Swallow-MS-instruct

tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 311 • 14

GPT-OSS-Swallow-v0.1

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1

Text Generation • 21B • Updated Feb 20 • 3.09k • 20
tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1

Text Generation • 117B • Updated Feb 20 • 12.2k • 15
tokyotech-llm/GPT-OSS-Swallow-20B-SFT-v0.1

Text Generation • 21B • Updated Feb 20 • 437 • 5
tokyotech-llm/GPT-OSS-Swallow-120B-SFT-v0.1

Text Generation • 117B • Updated Feb 20 • 191 • 2

Qwen3-Swallow-v0.2

tokyotech-llm/Qwen3-Swallow-8B-RL-v0.2

Text Generation • 8B • Updated Feb 23 • 2.3k • • 9
tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2

Text Generation • 31B • Updated Feb 23 • 250 • 7
tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2

Text Generation • 33B • Updated Feb 23 • 2.12k • • 1
tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2

Text Generation • 8B • Updated Feb 23 • 1.44k • • 5

Apache-2.0 Open High Quality Math Corpus

tokyotech-llm/swallow-math-v2

Viewer • Updated Nov 6, 2025 • 17.4M • 21k • 31

Apache-2.0 Open High Quality Code Corpus

tokyotech-llm/swallow-code-v2

Viewer • Updated Nov 8, 2025 • 147M • 76.9k • 37

Llama-3.1-Swallow-v0.5

tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 402 • 9
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 2.17k • • 19

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-math

Viewer • Updated Mar 1 • 4.33M • 1.24k • 47
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500

8B • Updated May 7, 2025 • 1
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000

8B • Updated May 7, 2025 • 6
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500

8B • Updated May 7, 2025 • 5

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

tokyotech-llm/swallow-code

Viewer • Updated Mar 1 • 129M • 1.36k • 65
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500

Updated Jul 4, 2025 • 5
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000

8B • Updated Jul 4, 2025 • 2
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500

8B • Updated Jul 4, 2025 • 3

Gemma-2-Swallow

tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1

Text Generation • 27B • Updated May 18, 2025 • 71 • 1
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1

Text Generation • Updated May 18, 2025 • 6.5k • • 1
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1

Text Generation • Updated May 18, 2025 • 20 •
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1

Text Generation • Updated May 18, 2025 • 128 • • 4

Llama-3.3-Swallow

tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4

Text Generation • 71B • Updated Jul 1, 2025 • 243 • • 13
tokyotech-llm/Llama-3.3-Swallow-70B-v0.4

Text Generation • 71B • Updated May 31, 2025 • 47 • 4
tokyotech-llm/edu-classifier

Text Classification • Updated Jan 30, 2025 • 274 • 13

Llama-3.1-Swallow

tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5

Text Generation • 8B • Updated Jun 25, 2025 • 2.17k • • 19
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

8B • Updated Jul 1, 2025 • 402 • 9
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3

Text Generation • 71B • Updated Apr 2, 2025 • 839 • • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3

Text Generation • 8B • Updated Apr 2, 2025 • 6.72k • • 24

Llama-3-Swallow

tokyotech-llm/Llama-3-Swallow-8B-v0.1

Text Generation • Updated Oct 8, 2024 • 1.07k • • 12
tokyotech-llm/Llama-3-Swallow-70B-v0.1

Text Generation • Updated Oct 8, 2024 • 16 • • 6
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1

Text Generation • 8B • Updated Oct 8, 2024 • 9.27k • • 21
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1

Text Generation • 71B • Updated Oct 8, 2024 • 100 • • 7

Continual Pre-Training from Llama 2

tokyotech-llm/Swallow-7b-hf

Text Generation • 7B • Updated Oct 8, 2024 • 2.26k • 17
tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 392 • 44
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 103 • 4
tokyotech-llm/Swallow-7b-plus-hf

Text Generation • Updated Oct 8, 2024 • 8 • 8

Swallow-instruct

Swallow instruction tuning models

tokyotech-llm/Swallow-7b-instruct-hf

Text Generation • 7B • Updated Oct 8, 2024 • 392 • 44
tokyotech-llm/Swallow-13b-instruct-v0.1

Text Generation • 13B • Updated Oct 8, 2024 • 54 • 1
tokyotech-llm/Swallow-70b-instruct-v0.1

Text Generation • 69B • Updated Oct 8, 2024 • 10
tokyotech-llm/Swallow-7b-instruct-v0.1

Text Generation • 7B • Updated Oct 8, 2024 • 103 • 4

Swallow MS/MX (Mistral/Mixtral) models

tokyotech-llm/Swallow-MS-7b-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 34 • 29
tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 311 • 14
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 282 • 29

Swallow MX(Mixtral) models

tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1

Text Generation • 47B • Updated Aug 17, 2024 • 282 • 29

Swallow-MS-instruct

tokyotech-llm/Swallow-MS-7b-instruct-v0.1

Text Generation • 7B • Updated Aug 17, 2024 • 311 • 14

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs