BlackjackPilot Blog

How BlackjackPilot Validates Its Accuracy

How we verify BlackjackPilot against an exact combinatorial analyzer and multiple independent references (Wizard of Odds, QFIT/CVCX-style SCORE, and a non-QFIT double-deck benchmark).

Published June 3, 2026

Topic: Card Counting

A blackjack simulator is only as useful as it is correct. If the numbers are off, every downstream decision — bet ramp, index set, game selection — inherits the error. So instead of asking "do our numbers look reasonable?", we hold the engine to a harder standard:

Can a Monte Carlo engine reproduce an exact mathematical reference, layer by layer, and also match independent published references and simulators?

This post documents how we validate BlackjackPilot, what we found, and where the honest limits of these comparisons are.

The foundation: our own combinatorial analyzer

Most validation chases another simulator. We start one level deeper. BlackjackPilot ships with a combinatorial analyzer (CA) — an exact, no-sampling calculator that integrates over every possible card sequence to produce probabilities and expected values with no Monte Carlo noise at all.

The CA is the ground truth. The Monte Carlo engine (the thing that powers the simulator) is then checked against it, one layer at a time:

Passing an exact analyzer at every layer is a stronger statement than matching any single simulator, because there is no sampling noise to hide behind.

Independent cross-checks

An in-house reference is necessary but not sufficient — it could share an assumption with the engine. So we also compare against references we did not build:

A fully independent double-deck benchmark

The cleanest external check uses a completely different source and a different simulator: the "GameMaster's Blackjack School" double-deck lesson, simulated on SBA (Statistical Blackjack Analyzer), not CVCX. The game is fully specified — 2D, H17, DAS, no surrender, Hi-Lo, 60/104 penetration, play-all, with an exact true-count bet ramp — and it publishes multiple numbers, including the low-variance ones that pin down betting behavior.

Reproducing its basic-strategy run (no index ambiguity):

MetricBlackjackPilot (95% CI)Reference (SBA)Verdict
Initial-bet edge0.535% [0.392, 0.678]0.59%inside CI
SCORE10.79 [5.8, 17.3]13.31inside CI
Avg initial bet1.742 units1.742 unitsexact match
SD per round2.836 units2.83 unitsexact match

The two lowest-variance quantities — average bet and standard deviation per round — match almost exactly. That is the strong result: it means the true-count ramp, the count frequencies, and the round-by-round variance all line up with an independent engine. The edge and SCORE sit comfortably inside the confidence interval.

The honest part: SCORE depends on a ramp nobody publishes

When you see a published SCORE labeled "1-8 spread" or "1-16 spread," that number was almost always computed with an optimal betting ramp calculated for that exact penetration — and the precise ramp is usually not printed. SCORE is proportional to (edge / standard-deviation)², so it is extremely sensitive to the exact bet at each count.

We saw this directly. A double-deck SCORE comparison looked biased until we traced it to the bet ramp: our benchmark had over-spread the bets (to 15 units) versus the reference's 1-8 cap. Capping the spread correctly moved SCORE most of the way back, and our layer-by-layer CA work had already proven the core EV was unbiased. In other words, the residual SCORE gap lived in the betting profile, not the blackjack engine.

The takeaway is important for reading any simulator comparison:

Matching a published SCORE exactly requires the exact optimal ramp behind the label. Differences at the aggressive end usually reflect ramp/cover assumptions, not a broken outcome engine — especially when the edge, average bet, and standard deviation already match.

What this means for you

We keep the validation harnesses in the repository as scripts so the checks are reproducible rather than one-off screenshots. Accuracy is not a marketing claim here; it is a test suite.