Digits releases latest edition of their Beyond the AI Hype benchmark

 

Digits just released their latest edition of "Beyond the AI Hype benchmark", with a notable first: "frontier AI models have now surpassed the outsourced human accountant baseline on transaction categorization."

The benchmark compared Digits Agentic General Ledger™, 13 frontier reasoning models, and outsourced human accountants on the same 2,000 transactions across four businesses, scored against a U.S. GAAP ground-truth set reviewed by professional accountants.

Digits AGL® led the benchmark again, achieving 97.8% accuracy with sub-second turnaround and zero hallucinations. It outperformed the best general-purpose model by 17 percentage points in one-shot categorization, and by 11 points even when top models were given an agent harness with added business context.

But the bigger industry signal is that five general-purpose models beat the human baseline for the first time.

According to Jeff Seibert, CEO of Digits: “Frontier models are becoming faster, more accurate, and less prone to hallucination. Work that firms have historically delegated to junior staff, offshore teams, or layers of expensive software can now be performed at human-level accuracy by AI available to anyone.”

That raises the next question for accounting firms: if general-purpose AI can now classify transactions at human-level accuracy, what separates out-of-the-box models from AI-native accounting platforms?

Seibert adds: “In accounting, intelligence becomes useful when it operates within the rules, context, and history of the books it works on. The advantage of purpose-built systems comes from grounding intelligence in opinionated workflows, deterministic guardrails, and a complete audit trail, while preserving human judgment as the final authority.”

You can learn more about what these latest findings mean for AI adoption in accounting by reading the Digit's blog post, which also includes a link to the related white paper

Viewer tier: free
Post tier: free