XFlowIQ Origin Estimated Benchmark
Use this as first-party benchmark language when the disclosure stays attached.
XFlowIQ now has its own first-party benchmark origin. We can publish the XFlowIQ Origin Benchmark as an estimated local benchmark because the method, rows, formula, distribution, and limitations are attached.
For token efficiency, XFlow chooses a strategy before spending tokens: estimate the route, decide whether a local readback, cache, compact support packet, narrow retrieval, critic check, or model call is needed, then score the result against the same accepted-output benchmark.
In XFlowIQ Origin Benchmark v0.2, a first-party estimated benchmark across 19 fixed local rows and 12 task families, XFlowIQ measured 33.33% median gross estimated token savings and 33.33% median net estimated savings after correction and blocked-task penalties. This is a local estimated benchmark, not provider-metered billing data; the method, distribution, and limitations are published with the result.
This benchmark uses fixed local task fixtures, accepted-output gates, ReceiptFit compact packet plans, and a deterministic token estimate. It is the first XFlowIQ-origin benchmark, so it publishes the method and limitations with the result.
Use this as first-party benchmark language when the disclosure stays attached.
This remains separate and blocked unless a future API/provider harness exists.
HumanEval/SWE-bench style work stays clearly labeled unless the official harness is run.
19 rows, 12 task families, 33.33% median estimated savings.
32.76% median estimated token savings
XFlow support packets reduced repeated setup, route explanation, and correction text while keeping accepted-output proof fields.
Claim boundary: This is the first local estimate shown as method evidence, not a public provider-backed savings claim.
70.11% median drill savings
XFlow learned to compress worker packets into risk, intent, files, plan, tests, proof, boundaries, and next action.
Claim boundary: This is a training-gym savings signal. It proves the behavior is learnable, not that public workloads save this exact amount.
36.05 points average score lift
Repeated repair loops improved XFlow's ability to pick the right packet shape and preserve safety/proof while staying compact.
Claim boundary: This is a local quality/readiness lift, not a token or cost savings claim.
10 required real paired receipts
The benchmark becomes public-claim ready only after same-task manual-vs-XFlow runs include provider/API usage logs, costs, latency, corrections, and accepted output proof.
Claim boundary: Until then, the website can show potential, method, local estimates, and progress, but not an unlocked public percentage.
A future reproducible comparison page for XGuardIQ's supervised security layer beside Norton-style endpoint protection. This page does not claim XGuardIQ beats Norton until exact versions, environment, scripts, and complete results exist.
Open pageXGuardIQ is positioned as a supervised layer around security decisions: pre-action risk classification, quarantine-style evidence holding, approvals, receipts, and local health reporting. It does not replace Microsoft Defender.
Open pageA paired-run benchmark for measuring whether XFlowIQ reduces total AI consumption while keeping or improving accepted engineering outcomes.
Open pageA proof path for showing how XFlowIQ coordinates multiple AI lanes with receipts, role boundaries, risk gates, and evidence instead of uncontrolled chat sprawl.
Open pageThe benchmark contract that keeps XFlowIQ honest: exact versions, same environment, complete results, competitor wins, limitations, scripts, and evidence.
Open page