Benchmarks

Comparison SEO, built around proof instead of thin pages.

XFlowIQ now has its own first-party benchmark origin. We can publish the XFlowIQ Origin Benchmark as an estimated local benchmark because the method, rows, formula, distribution, and limitations are attached.

For token efficiency, XFlow chooses a strategy before spending tokens: estimate the route, decide whether a local readback, cache, compact support packet, narrow retrieval, critic check, or model call is needed, then score the result against the same accepted-output benchmark.

In XFlowIQ Origin Benchmark v0.2, a first-party estimated benchmark across 19 fixed local rows and 12 task families, XFlowIQ measured 33.33% median gross estimated token savings and 33.33% median net estimated savings after correction and blocked-task penalties. This is a local estimated benchmark, not provider-metered billing data; the method, distribution, and limitations are published with the result.

Benchmark Origin

Every benchmark starts somewhere. This one starts here.

This benchmark uses fixed local task fixtures, accepted-output gates, ReceiptFit compact packet plans, and a deterministic token estimate. It is the first XFlowIQ-origin benchmark, so it publishes the method and limitations with the result.

allowed

XFlowIQ Origin Estimated Benchmark

Use this as first-party benchmark language when the disclosure stays attached.

blocked

Provider-metered billing claim

This remains separate and blocked unless a future API/provider harness exists.

blocked

Official external score

HumanEval/SWE-bench style work stays clearly labeled unless the official harness is run.

Origin benchmark v0.1

19 rows, 12 task families, 33.33% median estimated savings.

Token Savings Progress

Show the path: first benchmark, training lift, then stronger proof classes later.

first fixture

First local benchmark

32.76% median estimated token savings

XFlow support packets reduced repeated setup, route explanation, and correction text while keeping accepted-output proof fields.

Claim boundary: This is the first local estimate shown as method evidence, not a public provider-backed savings claim.

training run

Compact packet training

70.11% median drill savings

XFlow learned to compress worker packets into risk, intent, files, plan, tests, proof, boundaries, and next action.

Claim boundary: This is a training-gym savings signal. It proves the behavior is learnable, not that public workloads save this exact amount.

benchmark gym

Benchmark gym learning lift

36.05 points average score lift

Repeated repair loops improved XFlow's ability to pick the right packet shape and preserve safety/proof while staying compact.

Claim boundary: This is a local quality/readiness lift, not a token or cost savings claim.

provider log gate

Provider-log unlock

10 required real paired receipts

The benchmark becomes public-claim ready only after same-task manual-vs-XFlow runs include provider/API usage logs, costs, latency, corrections, and accepted output proof.

Claim boundary: Until then, the website can show potential, method, local estimates, and progress, but not an unlocked public percentage.

Comparison Pages

Small number, high substance.

evidence pending

XGuardIQ vs Norton: Reproducible Windows Security Benchmark

A future reproducible comparison page for XGuardIQ's supervised security layer beside Norton-style endpoint protection. This page does not claim XGuardIQ beats Norton until exact versions, environment, scripts, and complete results exist.

Open page
evidence pending

How XGuardIQ Augments Microsoft Defender

XGuardIQ is positioned as a supervised layer around security decisions: pre-action risk classification, quarantine-style evidence holding, approvals, receipts, and local health reporting. It does not replace Microsoft Defender.

Open page
sample harness ready

AI Coding Assistant Token-Efficiency Benchmark

A paired-run benchmark for measuring whether XFlowIQ reduces total AI consumption while keeping or improving accepted engineering outcomes.

Open page
methodology ready

Multi-Agent Engineering: Four Models, One Auditable Workspace

A proof path for showing how XFlowIQ coordinates multiple AI lanes with receipts, role boundaries, risk gates, and evidence instead of uncontrolled chat sprawl.

Open page
methodology ready

Benchmark Methodology and Reproduction Instructions

The benchmark contract that keeps XFlowIQ honest: exact versions, same environment, complete results, competitor wins, limitations, scripts, and evidence.

Open page