OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 1 day ago • 113
FileGram: Grounding Agent Personalization in File-System Behavioral Traces Paper • 2604.04901 • Published 1 day ago • 17
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published 4 days ago • 25
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents Paper • 2604.02947 • Published 4 days ago • 12
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation Paper • 2604.02368 • Published 11 days ago • 3
CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning Paper • 2604.03231 • Published 4 days ago • 1
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published 4 days ago • 42
Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning Paper • 2604.02007 • Published 5 days ago • 8
Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers Paper • 2604.01128 • Published 6 days ago • 12
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants Paper • 2604.00842 • Published 6 days ago • 10
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 6 days ago • 27
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 6 days ago • 26
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 11 days ago • 22
Meta-Harness: End-to-End Optimization of Model Harnesses Paper • 2603.28052 • Published 8 days ago • 14
VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing Paper • 2603.29852 • Published Feb 22 • 6
Learn2Fold: Structured Origami Generation with World Model Planning Paper • 2603.29585 • Published Feb 2 • 16