Task Design
- Easy: basic classification across 5 emails.
- Medium: prioritization context across 10 emails.
- Hard: full inbox workflow across 15 emails.
A production-style inbox triage environment where an agent must classify each
email as spam, normal, or urgent
using the standard OpenEnv reset, step, and state flow.
/web/ for the interactive playground./docs for the API surface./health for the deployment heartbeat.