Reasoning
updated
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
• 2309.09117
• Published
• 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models
Paper
• 2310.08491
• Published
• 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published
• 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
• 2411.14432
• Published
• 25
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
• 2412.15797
• Published
• 18
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
• 2502.03275
• Published
• 18
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
• 2502.10454
• Published
• 7
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published
• 38
Entropy-Regularized Process Reward Model
Paper
• 2412.11006
• Published
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings
from MCTS-Boosted Mathematical Reasoning
Paper
• 2412.15904
• Published
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
• 2503.05592
• Published
• 27
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
• 2503.10639
• Published
• 53
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
• 2503.08525
• Published
• 17
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
• 2503.17352
• Published
• 24
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
• 2503.24235
• Published
• 54
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
• 2503.22230
• Published
• 45
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published
• 26
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
• 2504.15257
• Published
• 47
Paper
• 2505.14674
• Published
• 37
Think Only When You Need with Large Hybrid-Reasoning Models
Paper
• 2505.14631
• Published
• 20
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published
• 46
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization
Paper
• 2505.19000
• Published
• 42
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
• 2505.24863
• Published
• 97
From Token to Action: State Machine Reasoning to Mitigate Overthinking
in Information Retrieval
Paper
• 2505.23059
• Published
• 13
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for
Over-Reasoning Mitigation
Paper
• 2506.02397
• Published
• 36
Comment on The Illusion of Thinking: Understanding the Strengths and
Limitations of Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.09250
• Published
• 27
Reasoning with Exploration: An Entropy Perspective
Paper
• 2506.14758
• Published
• 31
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
• 2506.24119
• Published
• 50
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper
• 2507.08799
• Published
• 40
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published
• 90
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
• 2507.14958
• Published
• 47
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
Paper
• 2508.14444
• Published
• 43
Intern-S1: A Scientific Multimodal Foundation Model
Paper
• 2508.15763
• Published
• 269
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
• 2508.16072
• Published
• 4
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published
• 16
Think in Games: Learning to Reason in Games via Reinforcement Learning
with Large Language Models
Paper
• 2508.21365
• Published
• 29
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
• 2509.01363
• Published
• 59
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and
Open Resources
Paper
• 2509.21268
• Published
• 104
Understanding the Thinking Process of Reasoning Models: A Perspective
from Schoenfeld's Episode Theory
Paper
• 2509.14662
• Published
• 13
Variational Reasoning for Language Models
Paper
• 2509.22637
• Published
• 69
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During
Post Training
Paper
• 2509.25758
• Published
• 23
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
• 2510.00938
• Published
• 59
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
• 2510.03632
• Published
• 42
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior
Reasoning LLMs
Paper
• 2510.05069
• Published
• 13
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published
• 57
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper
• 2510.14901
• Published
• 48
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published
• 228
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought
Paper
• 2511.02779
• Published
• 59
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
• 2511.06221
• Published
• 132
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Paper
• 2511.16664
• Published
• 28
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
Paper
• 2511.21750
• Published
• 6
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
Paper
• 2511.22134
• Published
• 22
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
Paper
• 2512.00722
• Published
• 16
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Paper
• 2512.07461
• Published
• 78
Universal Reasoning Model
Paper
• 2512.14693
• Published
• 43
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
• 2512.15687
• Published
• 21
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
• 2512.20618
• Published
• 55
Multi-hop Reasoning via Early Knowledge Alignment
Paper
• 2512.20144
• Published
• 7
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs
Paper
• 2512.17206
• Published
• 20
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
• 2512.19995
• Published
• 16
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
• 2601.05167
• Published
• 31
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
• 2601.03559
• Published
• 14
Evolving Programmatic Skill Networks
Paper
• 2601.03509
• Published
• 87
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Paper
• 2601.05593
• Published
• 84
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
• 2601.09259
• Published
• 95
Language of Thought Shapes Output Diversity in Large Language Models
Paper
• 2601.11227
• Published
• 9
Agentic Reasoning for Large Language Models
Paper
• 2601.12538
• Published
• 197
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Paper
• 2601.15165
• Published
• 72
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published
• 40
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper
• 2601.21821
• Published
• 59
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought
Paper
• 2601.23184
• Published
• 36
Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning
Paper
• 2602.01335
• Published
• 16
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing
Paper
• 2602.03845
• Published
• 26
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities
Paper
• 2601.21937
• Published
• 19
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
Paper
• 2602.04649
• Published
• 12
Free(): Learning to Forget in Malloc-Only Reasoning Models
Paper
• 2602.08030
• Published
• 5