CW

Statistics 101: Mean, Median, Variance and Distributions

Module 4: Statistics & ProbabilityFree Lesson

Advertisement

Statistics 101: Mean, Median, Variance and Distributions

Why Statistics Matters for Data Science

Statistics is the mathematical foundation of machine learning. Every ML algorithm is essentially a statistical model optimizing some objective function. Without understanding statistics, you cannot truly understand why models work or fail.

The Statistics → ML Pipeline

DescriptiveStatisticsInferentialStatisticsProbabilityTheoryStatisticalLearningMachineLearningDeepLearning

1. Measures of Central Tendency

Central tendency describes the "center" or "typical value" of a dataset.

Mean (Arithmetic Average)

Formula:

xˉ=1ni=1nxi=x1+x2++xnn\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \cdots + x_n}{n}
Mean: Balance Point of Data2571012141619Mean = 11.5x̄ = (2+5+7+10+12+14+16+19) / 8 = 11.5

Median (Middle Value)

The median is the value separating the higher half from the lower half of a data sample.

Formula:

Median={x(n+1)/2if n is oddxn/2+xn/2+12if n is even\text{Median} = \begin{cases} x_{(n+1)/2} & \text{if } n \text{ is odd} \\ \frac{x_{n/2} + x_{n/2+1}}{2} & \text{if } n \text{ is even} \end{cases}
Median: The Middle Value2x₍15x₍27x₍310x₍412x₍514x₍616x₍719x₍8Median = (10+12)/2 = 11n=8 (even), so average the 4th and 5th values

Mode (Most Frequent)

Mode: Most Frequent ValueCat3Dog7Bird2Fish5Rabbit7Snake1Mode = Dog and Rabbit (bimodal, frequency = 7)

2. Measures of Spread (Dispersion)

Variance and Standard Deviation

Population Variance:

σ2=1Ni=1N(xiμ)2\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2

Sample Variance (Bessel's Correction):

s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2

Standard Deviation:

σ=σ2,s=s2\sigma = \sqrt{\sigma^2}, \quad s = \sqrt{s^2}
Variance: Average Squared Deviation from Meanμ = 10616841001241416100

σ² = (16 + 4 + 0 + 4 + 16 + 0) / 6 = 40/6 = 6.67

σ = √6.67 = 2.58

Negative deviationPositive deviationOn the mean

3. The Normal Distribution (Gaussian)

The most important probability distribution in statistics and ML.

Probability Density Function (PDF):

f(x)=1σ2πe12(xμσ)2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

Where:

  • μ\mu = mean (location parameter)
  • σ\sigma = standard deviation (scale parameter)
  • σ2\sigma^2 = variance
Normal Distribution: The Bell Curveμμ - σμ + σ68.27%within ±1σ95.45%within ±2σ99.73%within ±3σμ-3σμ-2σμ-σμμ+σμ+2σμ+3σ

Standard Normal Distribution (Z-Score)

Z-Score Transformation:

Z=XμσZ = \frac{X - \mu}{\sigma}

This transforms any normal distribution to the standard normal with μ=0\mu = 0 and σ=1\sigma = 1.

Z-Score: How Many Standard Deviations from Mean?IQ Scores: N(100, 15²)10070130Z = (X-μ)/σStandard Normal: N(0, 1)0-2+2IQ = 130 → Z = (130-100)/15 = +2.0 (top 2.3%)

4. Central Limit Theorem (CLT)

The most important theorem in statistics — explains why the normal distribution appears everywhere.

Central Limit Theorem:

Given a population with mean μ\mu and standard deviation σ\sigma, the sampling distribution of the sample mean Xˉ\bar{X} approaches a normal distribution as sample size nn increases:

XˉN(μ,σ2n)\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)

Standard Error:

SE=σnSE = \frac{\sigma}{\sqrt{n}}
Central Limit Theorem in ActionPopulation(any shape)Take n samplescompute x̄n = 5n = 30n = 100

Key Insight: As n increases, the sampling distribution becomes more normal

σ̄ = σ/√n → Standard error decreases with √n

Practical Impactn=1SE = σn=4SE = σ/2n=16SE = σ/4n=100SE = σ/10

5. Skewness and Kurtosis

Skewness (Asymmetry)

Skewness: Measuring Distribution AsymmetrySymmetricSkew = 0Right SkewedSkew > 0Left SkewedSkew < 0

Mean > Median → Right Skewed | Mean < Median → Left Skewed

Kurtosis (Tail Weight)

Fisher's Kurtosis:

γ2=μ4σ43=E[(Xμ)4](E[(Xμ)2])23\gamma_2 = \frac{\mu_4}{\sigma^4} - 3 = \frac{E[(X-\mu)^4]}{(E[(X-\mu)^2])^2} - 3
  • Mesokurtic (γ₂ = 0): Normal distribution
  • Leptokurtic (γ₂ > 0): Heavy tails, sharp peak
  • Platykurtic (γ₂ < 0): Light tails, flat peak
Kurtosis: Comparing Tail BehaviorPlatykurtic (μ₄=2)Normal (μ₄=3)Leptokurtic (μ₄=8)Heavy tails →← Heavy tails

Higher kurtosis = more outliers, fatter tails, sharper peak


6. Covariance and Correlation

Covariance

Covariance:

Cov(X,Y)=σXY=1n1i=1n(xixˉ)(yiyˉ)\text{Cov}(X, Y) = \sigma_{XY} = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})
  • Cov > 0: Variables move together (positive)
  • Cov < 0: Variables move opposite (negative)
  • Cov = 0: No linear relationship

Pearson Correlation Coefficient

Pearson's r:

r=Cov(X,Y)σXσY=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \cdot \sum_{i=1}^{n}(y_i - \bar{y})^2}}
r[1,+1]r \in [-1, +1]
Correlation: Different Values of rr = -1 (Perfect)r = -0.5r = 0 (None)r = +1 (Perfect)r = +0.7 (Strong Positive)

Note: Correlation measures LINEAR relationship only. Non-linear patterns need other metrics.


7. Confidence Intervals

Confidence Interval for Mean:

xˉ±zα/2sn\bar{x} \pm z_{\alpha/2} \cdot \frac{s}{\sqrt{n}}

Common Confidence Levels:

Levelz-scoreArea in Tails
90%1.6455% each side
95%1.9602.5% each side
99%2.5760.5% each side
95% Confidence Interval Visualizationx̄ = 50386295%2.5%2.5%

We are 95% confident that the true mean lies between 38 and 62


Key Takeaways

  1. Mean, Median, Mode describe central tendency — choose based on data shape
  2. Variance/Standard Deviation quantify spread — foundation of all ML loss functions
  3. Normal Distribution — the bell curve underlies hypothesis testing and CLT
  4. CLT — sample means are normal regardless of population shape (n ≥ 30)
  5. Correlation ≠ Causation — always consider confounding variables
  6. Confidence Intervals — quantify uncertainty in estimates

Next: Probability, Bayes' Theorem and PDF/CDF

Build on these foundations with probability theory and Bayesian inference.

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement