LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on
Challenging Queries
Paper
• 2508.15760
• Published
• 47
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Paper
• 2508.01780
• Published
• 21
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
Paper
• 2304.08244
• Published
• 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160
Memp: Exploring Agent Procedural Memory
Paper
• 2508.06433
• Published
• 36
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models
Paper
• 2507.12806
• Published
• 21
Survey on Evaluation of LLM-based Agents
Paper
• 2503.16416
• Published
• 96
AgentBench: Evaluating LLMs as Agents
Paper
• 2308.03688
• Published
• 26
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Paper
• 2502.11221
• Published
• 1
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes
Paper
• 2506.14728
• Published
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First
Paper
• 2509.00997
• Published
• 2
Small Language Models are the Future of Agentic AI
Paper
• 2506.02153
• Published
• 24
MCP-AgentBench: Evaluating Real-World Language Agent Performance with
MCP-Mediated Tools
Paper
• 2509.09734
• Published
• 16
Paper
• 2509.10147
• Published
• 27
ReAct: Synergizing Reasoning and Acting in Language Models
Paper
• 2210.03629
• Published
• 33
ARE: Scaling Up Agent Environments and Evaluations
Paper
• 2509.17158
• Published
• 36
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP
Use
Paper
• 2509.24002
• Published
• 176