Coding efficiency

AI Coding Assistant Token-Efficiency Benchmark

A paired-run benchmark for measuring whether XFlowIQ reduces total AI consumption while keeping or improving accepted engineering outcomes.

Evidence Contract

Every comparison must carry the same proof fields.

required

Exact product and model versions

XFlowIQ build number and commit SHA
Codex or model version identifiers
Judge or verifier model identifiers
Repository commit, Node version, package lock hash, and OS build
required

Test environment

Same repository state for baseline and XFlow-assisted run
Same task prompt, acceptance threshold, and available tools
Same environmental settings and no hidden manual fixes
required

Baseline configuration

Single-assistant baseline using the same model family where practical
XFlow four-AI lane with the same acceptance threshold
Both lanes count input, output, system prompts, compression, tool calls, retries, failed runs, inter-AI messages, and judge calls
required

Complete results

Report median savings and p10/p25/p75/p90 distribution
Report pass rate, total tokens, cost, latency, human corrections, and time to accepted output
Report failed or rejected paired runs
required

Cases where the competitor performed better

A single assistant may be faster or cheaper on small tasks
XFlowIQ must prove value on complex, multi-step, evidence-heavy work where coordination matters
required

Limitations

Current fixture is sample-only until real paired logs are attached
Savings claims are locked until multiple task families are covered
A token win without equal or better pass rate is not a product win
required

Reproduction scripts or evidence

Benchmark JSON from /api/xflowiq/engineering-token-benchmark
Accepted diffs, tests, receipts, and verifier results
Public methodology explaining all counted buckets
Independence

XFlowIQ benchmark pages are independent product pages and are not affiliated with OpenAI, Google, Anthropic, Microsoft, or other model providers.

Public pages should keep this line visible so comparison SEO stays clean, honest, and reviewable.