δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 5 days ago • 109
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 6 days ago • 37
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution Paper • 2605.09942 • Published 6 days ago • 14
Continual Harness: Online Adaptation for Self-Improving Foundation Agents Paper • 2605.09998 • Published 6 days ago • 16
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents Paper • 2605.09530 • Published 7 days ago • 140
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 8 days ago • 76
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published 10 days ago • 65
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 10 days ago • 182
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 10 days ago • 106
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published 14 days ago • 105
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 10 days ago • 42
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 11 days ago • 97
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 13 days ago • 114