WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
The training and evaluation datasets for Paper "How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?"
-
meituan-longcat/R-HORIZON-Websearch
Viewer β’ Updated β’ 505 β’ 190 β’ 1 -
meituan-longcat/R-HORIZON-AMC23
Viewer β’ Updated β’ 200 β’ 3.71k β’ 1 -
meituan-longcat/R-HORIZON-Math500
Viewer β’ Updated β’ 2.5k β’ 68 β’ 1 -
meituan-longcat/R-HORIZON-AIME25
Viewer β’ Updated β’ 150 β’ 163 β’ 1
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
The training and evaluation datasets for Paper "How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?"
-
meituan-longcat/R-HORIZON-Websearch
Viewer β’ Updated β’ 505 β’ 190 β’ 1 -
meituan-longcat/R-HORIZON-AMC23
Viewer β’ Updated β’ 200 β’ 3.71k β’ 1 -
meituan-longcat/R-HORIZON-Math500
Viewer β’ Updated β’ 2.5k β’ 68 β’ 1 -
meituan-longcat/R-HORIZON-AIME25
Viewer β’ Updated β’ 150 β’ 163 β’ 1