-
-
-
-
-
-
Inference Providers
Active filters: grpo
webxos/microclaw-for-openclaw-version-2026.2.17
Text Generation
• Updated
• 240
• 3
Text Generation
• 4B • Updated
• 222
• 2
snap-stanford/humanlm-opinion
Text Generation
• 8B • Updated
• 120
• 9
LightningRodLabs/Trump-Forecaster
Text Generation
• Updated
• 105
• 4
lokahq/Trinity-Mini-DrugProt-Think
Text Generation
• Updated
• 2
trentmkelly/Qwen3-14B-ZeroGPT-beta
trentmkelly/Llama-3.1-8b-Instruct-Pangram
Text Generation
• Updated
• 4
Image-Text-to-Text
• 4B • Updated
• 21
• 2
mradermacher/MetaphorStar-3B-GGUF
Reinforcement Learning
• 3B • Updated
• 702
• 1
mradermacher/MetaphorStar-3B-i1-GGUF
Reinforcement Learning
• 3B • Updated
• 2.6k
• 1
LightningRodLabs/Golf-Forecaster
Text Generation
• Updated
• 29
• 4
mradermacher/StarPO-4B-GGUF
Reinforcement Learning
• 4B • Updated
• 879
• 1
mradermacher/StarPO-4B-i1-GGUF
Reinforcement Learning
• 4B • Updated
• 3.92k
• 1
Chun121/Qwen3-4B-RPG-Roleplay-V2
Text Generation
• 4B • Updated
• 7.51k
• 38
Text Generation
• 0.1B • Updated
• 3
8B • Updated
sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B • Updated
• 75
• 4
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated
• 9
• 8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated
• 3
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated
• 2
• 1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
• 3B • Updated
• 10
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO
Text Generation
• 8B • Updated
• 10
• 1
MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured
Text Generation
• 2B • Updated
• 11
• 5
mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF
2B • Updated
• 162
• 2