AIBA

Agentic Interface Benchmark with Audits
▸ Selection syntax (click to expand)
  • (empty) — run all available tasks
  • aiba-foo-s1 — alphanumeric = reproducible invite code
  • 15 — single task (only T15)
  • 5-8 — range (T5, T6, T7, T8)
  • 1,3,15 — comma list
  • 1-15,44-50,127 — mixed range + list
  • 31+resume from T31 to end (e.g. you stopped at T30 and want to continue)
  • ?tasks=1-38&seed=aiba-s1 — URL params can combine a task range with a reproducible seed
  • ?tasks=1-38&seed=aiba-s1 — URL params can pin a task range and seed
Note: valid task IDs are T1–T180; out-of-range IDs warn and skip.
Loads all tasks by default.
Score: 0