STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens Paper • 2602.15620 • Published Feb 17 • 3