Amazon & Microsoft Interview

Cross-Validation: K-Fold, Stratified & Time Series Split

The gold standard for model evaluation and selection

Interview Question

"Explain the different cross-validation strategies. When would you use stratified K-fold vs regular K-fold? How do you handle cross-validation for time series data?"

Difficulty: Medium | Frequently asked at Amazon, Microsoft, Google

Theoretical Foundation

Why Cross-Validation?

Cross-validation provides an unbiased estimate of model performance by:

Using all data for both training and validation
Reducing variance of the performance estimate
Detecting overfitting

The Problem with Holdout

A single train-test split has high variance:

Different splits give different performance estimates
Small datasets are especially problematic
No guarantee that the split is representative

K-Fold Cross-Validation

Algorithm:

Split data into $K$ equal folds
For each fold $k = 1, \ldots, K$ : a. Train on folds $\{1, \ldots, K\} \setminus \{k\}$ b. Validate on fold $k$
Average the $K$ validation scores

Properties:

Each sample is used for validation exactly once
Each sample is used for training $K-1$ times
Variance decreases as $K$ increases
Typical values: $K = 5$ or $K = 10$

Leave-One-Out Cross-Validation (LOOCV)

Special case where $K = n$ :

Each fold has exactly one sample
Maximum bias reduction
Very high computational cost ( $n$ models)
High variance (folds are highly correlated)

ℹ️

Key Insight: LOOCV has low bias but high variance because the $n$ training sets overlap significantly. K-fold with $K=5$ or $K=10$ often performs better due to lower variance.

Stratified K-Fold

Ensures each fold has approximately the same class distribution as the full dataset:

\frac{n_{k,c}}{n_k} \approx \frac{n_c}{n}

where $n_{k,c}$ is the count of class $c$ in fold $k$ .

When to use:

Imbalanced datasets
Classification with rare classes
When class distribution matters

Time Series Cross-Validation

Standard K-fold violates temporal order (training on future to predict past). Time series CV respects temporal order:

Forward Chaining

\text{Fold 1: Train=[1], Test=[2]}

\text{Fold 2: Train=[1,2], Test=[3]}

\text{Fold 3: Train=[1,2,3], Test=[4]}

Expanding Window

Training set grows over time
Test set is always in the future
Respects causal structure

Sliding Window

Training set has fixed size
Window moves forward in time
Better for non-stationary data

Cross-Validation Strategies Comparison

Method	Bias	Variance	Computation	Use Case
Holdout	High	High	Low	Quick evaluation
K-Fold	Medium	Medium	Medium	General purpose
Stratified K-Fold	Medium	Low-Medium	Medium	Imbalanced classes
LOOCV	Low	High	High	Small datasets
Time Series	Medium	Medium	Medium	Temporal data

Nested Cross-Validation

For simultaneous model selection and evaluation:

\text{Outer loop: } K_1 \text{ folds for evaluation}

\text{Inner loop: } K_2 \text{ folds for hyperparameter tuning}

Why nested?

Without nesting, hyperparameter selection overfits to the validation fold
Nested CV provides unbiased performance estimate

⚠️

Common Misconception: Many candidates forget that hyperparameter tuning requires nested cross-validation. Using the same cross-validation for both tuning and evaluation gives optimistic performance estimates.

Bootstrap Estimation

Alternative to cross-validation:

Draw $B$ bootstrap samples
Train on each sample
Evaluate on out-of-bag (OOB) samples
Average OOB scores

Properties:

Each sample has probability $1 - (1 - 1/n)^n \approx 0.368$ of being OOB
Similar to leave-one-out in expectation
Can be repeated (0.632+ bootstrap)

Choosing $K$

$K = 5$ : Good balance of bias and variance, fast
$K = 10$ : Lower bias, higher computation
$K = n$ (LOOCV): Lowest bias, highest variance and computation
Rule of thumb: Use $K = 5$ or $K = 10$ unless you have a specific reason

Code Implementation

Explanation of Code

K-Fold: Compares different values of K and shows variance reduction.
Stratified K-Fold: Demonstrates class preservation in folds for imbalanced data.
Time Series Split: Shows proper temporal splitting to avoid data leakage.
LOOCV: Compares leave-one-out with K-fold.
Nested CV: Shows unbiased performance estimation with hyperparameter tuning.
Visualization: Illustrates different CV splitting strategies.

Real-World Applications

Amazon: Model Selection

Amazon uses cross-validation for:

A/B Test Design: Estimating required sample sizes
Model Selection: Choosing between competing models
Hyperparameter Tuning: Optimizing model performance

Microsoft: Production Evaluation

Microsoft uses cross-validation for:

Model Validation: Ensuring production readiness
Fairness Evaluation: Checking performance across subgroups
Drift Detection: Monitoring model degradation

💡

Amazon Interview Tip: Be prepared to discuss how cross-validation affects the bias-variance tradeoff. Mention that $K=5$ is often preferred because it balances computational cost with estimation accuracy.

Common Follow-Up Questions

Q1: Why is K=10 commonly used for cross-validation?

$K=10$ provides a good balance:

Low enough bias (each training set uses 90% of data)
Low enough variance (10 folds provide stable estimates)
Computationally feasible (10 models)

Q2: Can you use cross-validation with time series data?

Yes, but you must respect temporal order. Use TimeSeriesSplit or forward chaining. Never shuffle time series data because it creates data leakage (training on future to predict past).

Q3: What is nested cross-validation and when is it needed?

Nested CV is used when you need to tune hyperparameters AND estimate performance. The inner loop tunes hyperparameters, the outer loop estimates performance. Without nesting, the performance estimate is optimistic.

Q4: How do you handle imbalanced datasets in cross-validation?

Use StratifiedKFold to preserve class distribution in each fold. For severe imbalance, also consider:

Stratified sampling
Oversampling (SMOTE) within folds
Using appropriate metrics (F1, AUC-PR)

Company-Specific Tips

Amazon Interview Tips

Discuss online learning where traditional CV doesn't apply
Be ready to explain temporal validation for time-dependent models
Mention cross-validation for recommendation systems
Talk about statistical significance testing

Microsoft Interview Tips

Focus on fairness-aware cross-validation
Discuss cross-validation for deep learning
Be prepared to explain nested cross-validation
Mention distributed cross-validation for large datasets

Cross-Validation: K-Fold, Stratified & Time Series Split

Cross-Validation: K-Fold, Stratified & Time Series Split

Interview Question

Theoretical Foundation

Why Cross-Validation?

The Problem with Holdout

K-Fold Cross-Validation

Leave-One-Out Cross-Validation (LOOCV)

Stratified K-Fold

Time Series Cross-Validation

Forward Chaining

Expanding Window

Sliding Window

Cross-Validation Strategies Comparison

Nested Cross-Validation

Bootstrap Estimation

Choosing KKK

Code Implementation

Explanation of Code

Real-World Applications

Amazon: Model Selection

Microsoft: Production Evaluation

Common Follow-Up Questions

Company-Specific Tips

Amazon Interview Tips

Microsoft Interview Tips

Related Topics

Choosing $K$