Augusteinia 's Collections RL thinking
updated
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper
• 2505.10320
• Published
• 24
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
• 2505.09343
• Published
• 76
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
• 2505.10554
• Published
• 120
Scaling Reasoning can Improve Factuality in Large Language Models
Paper
• 2505.11140
• Published
• 7
Chain-of-Model Learning for Language Model
Paper
• 2505.11820
• Published
• 121
AdaptThink: Reasoning Models Can Learn When to Think
Paper
• 2505.13417
• Published
• 83
Paper
• 2505.14674
• Published
• 37
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
• 2505.14810
• Published
• 62
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
• 2505.15966
• Published
• 53
AceReason-Nemotron: Advancing Math and Code Reasoning through
Reinforcement Learning
Paper
• 2505.16400
• Published
• 36
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation
with Reinforcement Learning
Paper
• 2505.17022
• Published
• 27
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published
• 88
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
• 2505.17612
• Published
• 81
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in
Reasoning Models
Paper
• 2505.17225
• Published
• 64
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
• 2506.15211
• Published
• 39