RL thinking - a Augusteinia Collection

Augusteinia 's Collections

RL thinking

updated Jun 26, 2025

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Paper • 2505.10320 • Published May 15, 2025 • 24
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14, 2025 • 76
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Paper • 2505.10554 • Published May 15, 2025 • 120
Scaling Reasoning can Improve Factuality in Large Language Models

Paper • 2505.11140 • Published May 16, 2025 • 7
Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17, 2025 • 121
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19, 2025 • 83
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20, 2025 • 62
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Paper • 2505.16400 • Published May 22, 2025 • 36
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Paper • 2505.17022 • Published May 22, 2025 • 27
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23, 2025 • 81
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Paper • 2505.17225 • Published May 22, 2025 • 64
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Paper • 2506.15211 • Published Jun 18, 2025 • 39