← Math|41 of 100
Probability

Covariance and Correlation

Master covariance, correlation, their properties, and their applications in PCA, feature selection, and portfolio optimization.

📂 Dependence📖 Lesson 41 of 100🎓 Free Course

Advertisement

📋Key Takeaways

  • Covariance Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[XY] - E[X]E[Y] measures the joint variability of two variables. Positive means they co-move; negative means they move in opposite directions; zero means no linear relationship.
  • Correlation ρ=Cov(X,Y)σXσY[1,1]\rho = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} \in [-1, 1] normalizes covariance to a unitless scale, making it comparable across different variable pairs. ρ=1|\rho| = 1 indicates a perfect linear relationship.
  • Correlation ≠ Causation: Correlation measures association, not causation. Confounding variables, reverse causation, and coincidence can all produce spurious correlations.
  • Uncorrelated ≠ Independent: Zero correlation only rules out linear dependence. Non-linear dependencies (e.g., Y=X2Y = X^2) can exist even when ρ=0\rho = 0. Only for bivariate normal distributions does ρ=0\rho = 0 imply independence.
  • Covariance Matrix Σ\Sigma is symmetric and positive semi-definite. Its diagonal contains variances, off-diagonals contain covariances. Eigenvalue decomposition of Σ\Sigma is the foundation of PCA.
  • Applications: Feature selection (multicollinearity), PCA (dimensionality reduction), portfolio optimization (Markowitz), Gaussian distributions, attention mechanisms, and natural gradient methods all rely on covariance and correlation.
Lesson Progress41 / 100