CAIS updated the Remote Labor Index — a benchmark where AI agents complete real client projects: design, video, CAD, architecture, analytics, and web apps.

The work is reviewed by humans and compared against a professional deliverable that a paying client would accept.

New results:

  • Claude Fable 5 — 16.1%
  • Claude Opus 4.8 — 8.3%
  • GPT-5.5 — 6.3%

Fable 5 is now the top model in RLI and almost 2x higher than Opus 4.8.

When the benchmark launched, the best result was 2.5%.