ML Cheatsheet — Quick Reference
A comprehensive quick reference for machine learning algorithms, metrics, math, and Python code.
Algorithm Selection
Problem Type → Algorithm
─────────────────────────────────────────
Classification (binary) → Logistic Regression, SVM, XGBoost
Classification (multi) → Random Forest, Neural Network
Regression → Linear Regression, XGBoost, Neural Network
Clustering → K-Means, DBSCAN, Hierarchical
Dimensionality Reduction → PCA, t-SNE, UMAP
Time Series → ARIMA, Prophet, LSTM
Image → CNN, ResNet, EfficientNet
Text → BERT, GPT, TF-IDF + classifier
Recommendation → Matrix Factorization, Neural CF
Generation → GAN, Diffusion, VAE
Metrics
Classification:
Accuracy = (TP + TN) / Total
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2 × (Precision × Recall) / (Precision + Recall)
AUC-ROC = Area under ROC curve
Regression:
MSE = (1/n) Σ(y - ŷ)²
RMSE = √MSE
MAE = (1/n) Σ|y - ŷ|
R² = 1 - SS_res / SS_tot
Clustering:
Silhouette Score = [-1, 1] (higher = better)
Math Quick Reference
Linear Algebra:
Dot product: a · b = Σ aᵢbᵢ
Matrix multiply: (AB)ᵢⱼ = Σ AᵢₖBₖⱼ
Norm: ||x|| = √(Σ xᵢ²)
Calculus:
d/dx(xⁿ) = nxⁿ⁻¹
d/dx(eˣ) = eˣ
d/dx(ln(x)) = 1/x
Chain rule: d/dx f(g(x)) = f'(g(x)) × g'(x)
Probability:
P(A|B) = P(B|A) × P(A) / P(B) (Bayes)
E[X] = Σ xᵢP(xᵢ) (Expected value)
Var(X) = E[(X-μ)²] (Variance)
Python Libraries
Data: pandas, numpy
Visualization: matplotlib, seaborn, plotly
ML: scikit-learn, xgboost, lightgbm
Deep Learning: pytorch, tensorflow, keras
NLP: transformers, spacy, nltk
CV: opencv, torchvision
AutoML: auto-sklearn, optuna
Deployment: fastapi, flask, streamlit
Experiment: mlflow, wandb
Key Takeaways
- Start simple — linear models as baselines
- Feature engineering matters more than algorithm choice
- Cross-validate everything
- Regularize to prevent overfitting
- Scale features for distance-based algorithms
- ** ensemble** multiple models for best performance
- Monitor models in production
- Keep learning — the field evolves fast