t-Tests

ℹ️ Why It Matters

t-tests are the foundation of statistical inference, enabling you to determine whether observed differences between sample means are real or simply due to random chance. From A/B testing in tech companies to clinical trials in medicine, t-tests power critical decision-making across every quantitative field. Mastering them gives you the ability to draw rigorous conclusions from data.

Overview

A t-test compares a sample mean to a hypothesized value or compares means between groups. The one-sample t-test compares a single group to a known value. The two-sample t-test compares means of two independent groups. The paired t-test compares means from the same subjects measured twice (before/after). Welch's t-test is the robust default that doesn't assume equal variances. All t-tests assume independence and approximate normality (critical for small samples). For large samples ( $n > 30$ ), the CLT ensures validity even without strict normality. Always report effect size (Cohen's d) alongside p-values to communicate practical significance.

Key Concepts

One-Sample t-Statistic

t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}

Here,

$\bar{x}$ =Sample mean
$\mu_0$ =Hypothesized population mean
$s$ =Sample standard deviation
$n$ =Sample size
$df = n - 1$ =Degrees of freedom

Two-Sample t-Statistic (Pooled)

t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}

Here,

$s_p$ =Pooled SD: $\sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}$
$df$ =$n_1 + n_2 - 2$

Welch's t-Statistic

t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Here,

$df$ =Welch-Satterthwaite approximation (non-integer)

Paired t-Statistic

t = \frac{\bar{d}}{s_d / \sqrt{n}}

Here,

$\bar{d}$ =Mean of differences $d_i = x_{1i} - x_{2i}$
$s_d$ =Standard deviation of differences
$n$ =Number of pairs

Cohen's d (Effect Size)

d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}

Here,

$s_p$ =Pooled standard deviation

t-Test Selection Guide

Scenario	Test	scipy Function	df Formula
Sample mean vs. known value	One-sample	`ttest_1samp(x, mu0)`	$n - 1$
Two independent groups (equal var)	Pooled	`ttest_ind(x, y, equal_var=True)`	$n_1 + n_2 - 2$
Two independent groups (unequal var)	Welch's	`ttest_ind(x, y, equal_var=False)`	Satterthwaite
Same subjects measured twice	Paired	`ttest_rel(x, y)`	$n - 1$

Assumptions

Independence: Observations must be independent. No clustering or repeated measures (use paired instead).
Normality: Data (or differences for paired) should be approximately normal. Critical for small samples ( $n < 30$ ).
Equal Variances (pooled only): Test with Levene's test. When in doubt, use Welch's.
Continuous Data: Dependent variable on interval or ratio scale.
No Significant Outliers: Check with boxplots or Cook's distance.

Quick Example

📝Welch's t-Test

Group A ( $n=25$ , $\bar{x}_A = 78$ , $s_A = 12$ ) vs. Group B ( $n=30$ , $\bar{x}_B = 84$ , $s_B = 18$ ):

t = \frac{78 - 84}{\sqrt{\frac{12^2}{25} + \frac{18^2}{30}}} = \frac{-6}{\sqrt{5.76 + 10.8}} = \frac{-6}{4.07} = -1.474

With $df \approx 50$ , critical value $\approx 2.009$ . Since $|-1.474| < 2.009$ , fail to reject $H_0$ . The difference is not statistically significant.

📝Effect Size

Two groups have means 112 and 120, pooled SD = 15.

d = \frac{|112 - 120|}{15} = 0.533

This is a medium effect size (between 0.5 and 0.8), meaning the difference is practically meaningful.

Key Takeaways

📋Summary: t-Tests

One-sample: Compare sample mean to a known value. Use when you have a benchmark.
Two-sample: Compare means of two independent groups. Use Welch's as the default.
Paired: Compare means from linked observations (before/after). More powerful than independent test for matched data.
Welch's: Robust to unequal variances. Nearly as powerful as pooled when variances are equal.
Assumptions: Independence is critical. Normality matters most for small samples ( $n < 30$ ).
Effect Size: Always report Cohen's d (0.2 = small, 0.5 = medium, 0.8 = large) alongside p-values.
CV Model Comparison: Always use the paired t-test for cross-validation fold comparison in ML.
Confidence Intervals: Construct CIs for the mean difference — they provide richer information than binary significant/not-significant.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Z-Test

One-Sample Z-Test — When σ is known; the simplest hypothesis test for a mean

t-Tests

One-Sample t-Test — Comparing a sample mean to a hypothesized value with worked examples
Two-Sample t-Test — Comparing means of two independent groups (pooled and Welch's)
Paired t-Test — Before/after studies and matched pairs analysis

t-Tests

t-Tests

Overview

Key Concepts

One-Sample t-Statistic

Two-Sample t-Statistic (Pooled)

Welch's t-Statistic

Paired t-Statistic

Cohen's d (Effect Size)

t-Test Selection Guide

Assumptions

Quick Example

📝Welch's t-Test

📝Effect Size

Key Takeaways

📋Summary: t-Tests

Deep Dive

Z-Test

t-Tests

Related Topics