-
tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1
Text Generation • 21B • Updated • 3.09k • 20 -
tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1
Text Generation • 117B • Updated • 12.2k • 15 -
tokyotech-llm/GPT-OSS-Swallow-20B-SFT-v0.1
Text Generation • 21B • Updated • 437 • 5 -
tokyotech-llm/GPT-OSS-Swallow-120B-SFT-v0.1
Text Generation • 117B • Updated • 191 • 2
AI & ML interests
None defined yet.
Apache-2.0 Open High Quality Math Corpus
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-code
Viewer • Updated • 129M • 1.36k • 65 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500
Updated • 5 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000
8B • Updated • 2 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500
8B • Updated • 3
-
tokyotech-llm/Llama-3-Swallow-8B-v0.1
Text Generation • Updated • 1.07k • • 12 -
tokyotech-llm/Llama-3-Swallow-70B-v0.1
Text Generation • Updated • 16 • • 6 -
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
Text Generation • 8B • Updated • 9.27k • • 21 -
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1
Text Generation • 71B • Updated • 100 • • 7
Swallow instruction tuning models
-
tokyotech-llm/Swallow-7b-instruct-hf
Text Generation • 7B • Updated • 392 • 44 -
tokyotech-llm/Swallow-13b-instruct-v0.1
Text Generation • 13B • Updated • 54 • 1 -
tokyotech-llm/Swallow-70b-instruct-v0.1
Text Generation • 69B • Updated • 10 -
tokyotech-llm/Swallow-7b-instruct-v0.1
Text Generation • 7B • Updated • 103 • 4
Swallow MX(Mixtral) models
-
tokyotech-llm/Qwen3-Swallow-8B-RL-v0.2
Text Generation • 8B • Updated • 2.3k • • 9 -
tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2
Text Generation • 31B • Updated • 250 • 7 -
tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2
Text Generation • 33B • Updated • 2.12k • • 1 -
tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2
Text Generation • 8B • Updated • 1.44k • • 5
Apache-2.0 Open High Quality Code Corpus
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-math
Viewer • Updated • 4.33M • 1.24k • 47 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500
8B • Updated • 1 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000
8B • Updated • 6 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500
8B • Updated • 5
-
tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1
Text Generation • 27B • Updated • 71 • 1 -
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1
Text Generation • Updated • 6.5k • • 1 -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1
Text Generation • Updated • 20 • -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1
Text Generation • Updated • 128 • • 4
-
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5
Text Generation • 8B • Updated • 2.17k • • 19 -
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5
8B • Updated • 402 • 9 -
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
Text Generation • 71B • Updated • 839 • • 13 -
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3
Text Generation • 8B • Updated • 6.72k • • 24
Continual Pre-Training from Llama 2
Swallow MS/MX (Mistral/Mixtral) models
-
tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1
Text Generation • 21B • Updated • 3.09k • 20 -
tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1
Text Generation • 117B • Updated • 12.2k • 15 -
tokyotech-llm/GPT-OSS-Swallow-20B-SFT-v0.1
Text Generation • 21B • Updated • 437 • 5 -
tokyotech-llm/GPT-OSS-Swallow-120B-SFT-v0.1
Text Generation • 117B • Updated • 191 • 2
-
tokyotech-llm/Qwen3-Swallow-8B-RL-v0.2
Text Generation • 8B • Updated • 2.3k • • 9 -
tokyotech-llm/Qwen3-Swallow-30B-A3B-RL-v0.2
Text Generation • 31B • Updated • 250 • 7 -
tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2
Text Generation • 33B • Updated • 2.12k • • 1 -
tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2
Text Generation • 8B • Updated • 1.44k • • 5
Apache-2.0 Open High Quality Math Corpus
Apache-2.0 Open High Quality Code Corpus
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-math
Viewer • Updated • 4.33M • 1.24k • 47 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500
8B • Updated • 1 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000
8B • Updated • 6 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500
8B • Updated • 5
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-code
Viewer • Updated • 129M • 1.36k • 65 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500
Updated • 5 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000
8B • Updated • 2 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500
8B • Updated • 3
-
tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1
Text Generation • 27B • Updated • 71 • 1 -
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1
Text Generation • Updated • 6.5k • • 1 -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1
Text Generation • Updated • 20 • -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1
Text Generation • Updated • 128 • • 4
-
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5
Text Generation • 8B • Updated • 2.17k • • 19 -
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5
8B • Updated • 402 • 9 -
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
Text Generation • 71B • Updated • 839 • • 13 -
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3
Text Generation • 8B • Updated • 6.72k • • 24
-
tokyotech-llm/Llama-3-Swallow-8B-v0.1
Text Generation • Updated • 1.07k • • 12 -
tokyotech-llm/Llama-3-Swallow-70B-v0.1
Text Generation • Updated • 16 • • 6 -
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
Text Generation • 8B • Updated • 9.27k • • 21 -
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1
Text Generation • 71B • Updated • 100 • • 7
Continual Pre-Training from Llama 2
Swallow instruction tuning models
-
tokyotech-llm/Swallow-7b-instruct-hf
Text Generation • 7B • Updated • 392 • 44 -
tokyotech-llm/Swallow-13b-instruct-v0.1
Text Generation • 13B • Updated • 54 • 1 -
tokyotech-llm/Swallow-70b-instruct-v0.1
Text Generation • 69B • Updated • 10 -
tokyotech-llm/Swallow-7b-instruct-v0.1
Text Generation • 7B • Updated • 103 • 4
Swallow MS/MX (Mistral/Mixtral) models
Swallow MX(Mixtral) models