Introduction
The scipy.stats module provides statistical distributions and functions for probability calculations and statistical tests.
Probability Distributions
from scipy import stats
import numpy as np
# Normal distribution
normal = stats.norm(loc=0, scale=1)
normal.pdf(0) # Probability density at x=0
normal.cdf(1) # Cumulative probability
normal.ppf(0.95) # Inverse CDF (quantile)
sample = normal.rvs(1000) # Random samples
# Other distributions
binomial = stats.binom(n=10, p=0.5)
poisson = stats.poisson(mu=3)
exponential = stats.expon(scale=1)
Statistical Tests
from scipy import stats
# T-test
group1 = [78, 82, 85, 90, 88]
group2 = [72, 75, 78, 80, 79]
t_stat, p_value = stats.ttest_ind(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_value}")
# Paired t-test
before = [120, 122, 118, 125, 121]
after = [115, 118, 112, 120, 116]
t_stat, p_value = stats.ttest_rel(before, after)
# Chi-square test
observed = [10, 20, 30]
expected = [15, 20, 25]
chi2, p_value = stats.chisquare(observed, expected)
Descriptive Statistics
from scipy import stats
import numpy as np
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
stats.describe(data) # Count, mean, variance, skew, kurtosis
stats.skew(data) # Skewness
stats.kurtosis(data) # Kurtosis
stats.zscore(data) # Z-scores
# Percentiles
np.percentile(data, [25, 50, 75])
Distribution Fitting
from scipy import stats
import numpy as np
# Generate sample data
data = stats.expon.rvs(scale=2, size=1000)
# Fit normal distribution
mu, sigma = stats.norm.fit(data)
print(f"Fitted: mu={mu:.2f}, sigma={sigma:.2f}")
# Fit exponential
lam = stats.expon.fit(data)
print(f"Fitted lambda: {lam}")
# Goodness of fit test
ks_stat, ks_p = stats.kstest(data, 'norm', args=(mu, sigma))
Practice Problems
- Calculate probability using normal distribution CDF
- Perform one-sample t-test on a dataset
- Test independence with chi-square test
- Fit a distribution to sample data
- Calculate confidence intervals