A/B Testing for ML — Complete Guide
A/B testing compares two versions to determine which performs better. Essential for ML model validation.
A/B Testing Framework
1. Hypothesis:
H₀: No difference between A and B
H₁: B is better than A
2. Randomization:
Split users into control (A) and treatment (B)
3. Metrics:
Primary: Click-through rate, conversion
Secondary: Revenue, engagement
4. Sample size:
Power analysis determines needed samples
5. Analysis:
Statistical test → p-value → Decision
Sample Size Calculation
from statsmodels.stats.power import NormalIndPower
analysis = NormalIndPower()
sample_size = analysis.solve_power(
effect_size=0.05, # Minimum detectable effect
alpha=0.05, # Significance level
power=0.80, # Statistical power
alternative='larger'
)
Key Takeaways
- A/B testing validates model improvements in production
- Random assignment eliminates bias
- Sample size calculation prevents underpowered tests
- Statistical significance ≠ practical significance
- Multi-armed bandits adapt during the test
- Online ML continuously optimizes
- Guardrail metrics prevent harm
- Longer tests capture temporal effects