BlackjackPilot Blog

How BlackjackPilot Validates Its Accuracy

How we verify BlackjackPilot against an exact combinatorial analyzer and multiple independent references (Wizard of Odds, QFIT/CVCX-style SCORE, and a non-QFIT double-deck benchmark).

Published June 3, 2026

Topic: Card Counting

A blackjack simulator is only as useful as it is correct. If the numbers are off, every downstream decision — bet ramp, index set, game selection — inherits the error. So instead of asking "do our numbers look reasonable?", we hold the engine to a harder standard:

Can a Monte Carlo engine reproduce an exact mathematical reference, layer by layer, and also match independent published references and simulators?

This post documents how we validate BlackjackPilot, what we found, and where the honest limits of these comparisons are.

The foundation: our own combinatorial analyzer

Most validation chases another simulator. We start one level deeper. BlackjackPilot ships with a combinatorial analyzer (CA) — an exact, no-sampling calculator that integrates over every possible card sequence to produce probabilities and expected values with no Monte Carlo noise at all.

The CA is the ground truth. The Monte Carlo engine (the thing that powers the simulator) is then checked against it, one layer at a time:

Dealer outcome distribution — probability the dealer finishes 17/18/19/20/21/BJ/bust for every upcard and rule (S17/H17). Engine vs CA across 1D/2D/6D: every bucket inside the 95% confidence interval, worst deviation ≤ 0.224 percentage points over 4M shoes.
Flat-bet hand EV — stand / hit / double for every starting hand and upcard, with the dealer hole card integrated exactly and US peek rules honored. Engine matches the CA.
Splits and settlement — including a fix we shipped: a post-split 21 must lose to a dealer natural, not push.
Insurance — the empirical win rate matches the exact 16·d / (52·d − 1) to within ~0.13σ.
EV by true count — per-count flat-bet EV matches a count-conditioned CA, so the engine is unbiased not just on average but at every true count.
Index-deviation EV — the EV gained by deviating (e.g. standing 16 vs 10 at a high count) matches the CA at every count, so deviations are credited correctly rather than over-counted.

Passing an exact analyzer at every layer is a stronger statement than matching any single simulator, because there is no sampling noise to hide behind.

Independent cross-checks

An in-house reference is necessary but not sufficient — it could share an assumption with the engine. So we also compare against references we did not build:

Wizard of Odds — total-dependent basic-strategy house edge. The CA reproduces the published structure (small, well-understood offsets come from fresh-shoe vs cut-card and resplit conventions).
QFIT reKO / KO and HILOData — SCORE tables from Modern Blackjack / CVCX. Six-deck rows land inside the engine's 95% SCORE confidence interval.
A non-QFIT double-deck benchmark — see below.

A fully independent double-deck benchmark

The cleanest external check uses a completely different source and a different simulator: the "GameMaster's Blackjack School" double-deck lesson, simulated on SBA (Statistical Blackjack Analyzer), not CVCX. The game is fully specified — 2D, H17, DAS, no surrender, Hi-Lo, 60/104 penetration, play-all, with an exact true-count bet ramp — and it publishes multiple numbers, including the low-variance ones that pin down betting behavior.

Reproducing its basic-strategy run (no index ambiguity):

Metric	BlackjackPilot (95% CI)	Reference (SBA)	Verdict
Initial-bet edge	0.535% [0.392, 0.678]	0.59%	inside CI
SCORE	10.79 [5.8, 17.3]	13.31	inside CI
Avg initial bet	1.742 units	1.742 units	exact match
SD per round	2.836 units	2.83 units	exact match

The two lowest-variance quantities — average bet and standard deviation per round — match almost exactly. That is the strong result: it means the true-count ramp, the count frequencies, and the round-by-round variance all line up with an independent engine. The edge and SCORE sit comfortably inside the confidence interval.

The honest part: SCORE depends on a ramp nobody publishes

When you see a published SCORE labeled "1-8 spread" or "1-16 spread," that number was almost always computed with an optimal betting ramp calculated for that exact penetration — and the precise ramp is usually not printed. SCORE is proportional to (edge / standard-deviation)², so it is extremely sensitive to the exact bet at each count.

We saw this directly. A double-deck SCORE comparison looked biased until we traced it to the bet ramp: our benchmark had over-spread the bets (to 15 units) versus the reference's 1-8 cap. Capping the spread correctly moved SCORE most of the way back, and our layer-by-layer CA work had already proven the core EV was unbiased. In other words, the residual SCORE gap lived in the betting profile, not the blackjack engine.

The takeaway is important for reading any simulator comparison:

Matching a published SCORE exactly requires the exact optimal ramp behind the label. Differences at the aggressive end usually reflect ramp/cover assumptions, not a broken outcome engine — especially when the edge, average bet, and standard deviation already match.

What this means for you

The outcome engine is validated against an exact analyzer at every layer, then cross-checked against multiple independent references and simulators.
Where small differences remain, they are concentrated in betting-profile assumptions (the unpublished optimal ramp), not in how hands are dealt, played, and settled.
The practical numbers you act on — house edge, EV by true count, the value of index deviations, average bet, and variance — hold up against references we did not build.

We keep the validation harnesses in the repository as scripts so the checks are reproducible rather than one-off screenshots. Accuracy is not a marketing claim here; it is a test suite.