Intro to Machine Learning: Supervised vs Unsupervised
What is Machine Learning?
Machine Learning (ML) is a subset of artificial intelligence that enables systems to learn from data and improve their performance without being explicitly programmed. Rather than following static rules, ML algorithms identify patterns in data and build mathematical models that make predictions or decisions.
DfMachine Learning (Tom Mitchell, 1997)
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
ML Mathematical Framework
Here,
- =Dataset of input-output pairs
- =Feature vector for sample i
- =Target label for sample i
- =Number of samples
βΉοΈ The No Free Lunch Theorem
No single learning algorithm is universally superior across all possible data distributions. For every algorithm that performs well on some class of problems, there exists a distribution on which it performs no better than random guessing. This is why empirical evaluation on representative data is essential.
Types of Machine Learning
Machine Learning
β
βββββββββββββββββΌββββββββββββββββ
β β β
Supervised Unsupervised Reinforcement
Learning Learning Learning
β β β
βββββββ΄ββββββ βββββ΄ββββ βββββ΄ββββ
β β β β β β
Classification Regression β Dimensional β
Clusteringβ Reduction β
β Agent-Based
β Learning
1. Supervised Learning
In supervised learning, the algorithm learns from labeled data β each input comes with a known correct output.
Key Characteristics:
- Training data includes input-output pairs
- Goal: Learn mapping from inputs to outputs
- Evaluation is straightforward: compare predictions to known labels
Two Main Tasks:
| Task | Output Type | Example | Algorithms |
|---|---|---|---|
| Classification | Discrete labels | Email β spam/not spam | Logistic Regression, SVM, Decision Trees |
| Regression | Continuous values | House features β price | Linear Regression, Random Forest, XGBoost |
Classification Formulation
Here,
- =Predicted class label
- =Number of classes
- =Posterior probability of class c given x
Regression Formulation
Here,
- =Weight vector
- =Bias term
- =Number of features
2. Unsupervised Learning
Unsupervised learning works with unlabeled data, discovering hidden patterns or structures.
Key Characteristics:
- No target variable provided
- Goal: Discover structure, patterns, or representations
- Evaluation is more subjective
Three Main Tasks:
| Task | Goal | Example | Algorithms |
|---|---|---|---|
| Clustering | Group similar data | Customer segmentation | K-Means, DBSCAN, Hierarchical |
| Dimensionality Reduction | Reduce features while preserving info | Visualize high-dim data | PCA, t-SNE, UMAP |
| Anomaly Detection | Find outliers | Fraud detection | Isolation Forest, Autoencoders |
K-Means Objective
Here,
- =Number of clusters
- =Set of points in cluster k
- =Centroid of cluster k
3. Reinforcement Learning
An agent learns to make decisions by interacting with an environment, receiving rewards or penalties.
Key Components:
- Agent: The learner/decision maker
- Environment: The world the agent interacts with
- State (s): Current situation of the agent
- Action (a): What the agent can do
- Reward (r): Feedback signal
- Policy (Ο): Strategy mapping states to actions
Bellman Equation (Value Function)
Here,
- =Value of state s
- =Reward for taking action a in state s
- =Discount factor in [0,1]
- =Transition probability
ML Workflow
+----------+ +-----------+ +----------+ +-----------+
| Define |--->| Collect |--->| Prepare |--->| Select |
| Problem | | Data | | Data | | Algorithm |
+----------+ +-----------+ +----------+ +-----------+
|
v
+----------+ +-----------+ +----------+ +-----------+
| Deploy & |<---| Evaluate |<---| Train |<---| Feature |
| Monitor | | Model | | Model | | Engineering|
+----------+ +-----------+ +----------+ +-----------+
Step-by-Step Process:
1. Problem Definition:
- What are we predicting?
- What type of ML task is this?
- What is the business objective?
2. Data Collection:
- Sources: databases, APIs, web scraping, sensors
- Consider: quality, quantity, representativeness
3. Data Preparation:
Data Preparation Pipeline
Here,
- =Raw input data
- =Fully processed data
4. Exploratory Data Analysis (EDA):
- Statistical summaries: mean, variance, correlations
- Visualization: histograms, scatter plots, heatmaps
5. Feature Engineering:
- Create new features:
- Transform features: , , polynomial features
- Select features: correlation analysis, mutual information
6. Model Selection & Training:
- Split data: training (60-80%), validation (10-20%), test (10-20%)
- Train multiple algorithms
- Tune hyperparameters
7. Model Evaluation:
βΉοΈ Model Evaluation
8. Deployment & Monitoring:
- Deploy to production
- Monitor for drift:
Model Selection Criteria
Bias-Variance Tradeoff
ThBias-Variance Decomposition (Proof Sketch)
For a model predicting where , the expected squared error at a point is:
Proof: Expand by adding and subtracting and , then apply the independence of and . Cross terms vanish due to and .
Error
^
| \ Total Error
| \ /
| \ /
| \ /
| \_/ <-- Optimal complexity
| / \
| / \
| / \ Variance
| / \___________
| /
| / Bias^2
| /
+----------------------------------> Model Complexity
Simple Complex
Overfitting vs Underfitting
| Condition | Training Error | Validation Error | Diagnosis |
|---|---|---|---|
| Underfitting | High | High | Model too simple |
| Good Fit | Low | Low (close to training) | Model appropriate |
| Overfitting | Very Low | High | Model too complex |
Regularization
π‘ Preventing Overfitting
To prevent overfitting, add penalty term:
| Type | Penalty | Formula | Effect |
|---|---|---|---|
| Ridge (L2) | Shrinks coefficients | ||
| Lasso (L1) | Feature selection | ||
| Elastic Net | Mix | Both effects |
Real-World Applications
1. Healthcare β Disease Diagnosis
features = ['age', 'blood_pressure', 'cholesterol', 'glucose', 'bmi']
# Supervised classification: healthy vs diabetic
# Accuracy: 95%+, used as screening tool
2. Finance β Credit Scoring
features = ['income', 'debt_ratio', 'credit_history', 'employment_years']
# Binary classification: approve/deny loan
# Goal: Minimize false positives (approving risky borrowers)
3. E-commerce β Recommendation Systems
# User-item interaction matrix
# Unsupervised: collaborative filtering
# Find users with similar purchase patterns
# Recommend items they haven't seen
4. Autonomous Vehicles β Object Detection
# Computer vision pipeline
# 1. Detect objects (cars, pedestrians, signs)
# 2. Classify object types
# 3. Predict trajectories
# Deep learning + reinforcement learning
5. Natural Language Processing β Sentiment Analysis
# Text classification
# Input: "This product is amazing!"
# Output: Positive sentiment (0.95 probability)
# Use case: Brand monitoring, customer feedback
Complete Python Example
πSupervised vs Unsupervised Learning Comparison
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, classification_report, silhouette_score
# Generate synthetic dataset
np.random.seed(42)
n_samples = 1000
# Features: income, age, credit_score
X = np.column_stack([
np.random.normal(50000, 15000, n_samples),
np.random.normal(40, 12, n_samples),
np.random.normal(680, 50, n_samples)
])
# Binary target: loan approval (0=denied, 1=approved)
y = ((X[:, 0] > 45000) & (X[:, 2] > 650)).astype(int)
noise = np.random.binomial(1, 0.1, n_samples)
y = np.bitwise_xor(y, noise)
df = pd.DataFrame(X, columns=['income', 'age', 'credit_score'])
df['approved'] = y
# --- Supervised Learning ---
X_train, X_test, y_train, y_test = train_test_split(
df.drop('approved', axis=1), df['approved'],
test_size=0.2, random_state=42, stratify=df['approved']
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
lr_model = LogisticRegression(random_state=42)
lr_model.fit(X_train_scaled, y_train)
lr_pred = lr_model.predict(X_test_scaled)
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train_scaled, y_train)
rf_pred = rf_model.predict(X_test_scaled)
print("--- Logistic Regression ---")
print(f"Accuracy: {accuracy_score(y_test, lr_pred):.4f}")
print(classification_report(y_test, lr_pred))
print("\n--- Random Forest ---")
print(f"Accuracy: {accuracy_score(y_test, rf_pred):.4f}")
print(classification_report(y_test, rf_pred))
# --- Unsupervised Learning ---
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
clusters = kmeans.fit_predict(scaler.fit_transform(df.drop('approved', axis=1)))
print("\n--- K-Means Clustering ---")
print(f"Silhouette Score: {silhouette_score(scaler.fit_transform(df.drop('approved', axis=1)), clusters):.4f}")
print(f"Cluster sizes: {np.bincount(clusters)}")
Key Takeaways
πSummary: Intro to Machine Learning
- ML = Learning from Data: Systems improve with experience without explicit programming
- Three Paradigms: Supervised (labeled), Unsupervised (unlabeled), Reinforcement (reward-based)
- Bias-Variance Tradeoff: Balance model complexity to minimize total error:
- Workflow Matters: Success depends more on data preparation than algorithm choice
- No Free Lunch: No single algorithm works best for all problems β try multiple approaches
- Evaluation is Critical: Always use held-out test data; never evaluate on training data
Practice Exercises
Exercise 1: Problem Classification
Classify each scenario as supervised, unsupervised, or reinforcement learning:
- a) Predicting house prices from features
- b) Grouping customers by purchase behavior
- c) Training a robot to walk
- d) Detecting spam emails
- e) Reducing 1000 features to 10 for visualization
Exercise 2: Dataset Exploration
from sklearn.datasets import load_iris
iris = load_iris()
# a) How many samples and features?
# b) What are the class labels?
# c) Visualize feature distributions
# d) Which features are most discriminative?
Exercise 3: Model Comparison
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
# Perform 5-fold cross-validation for each
# Which algorithm performs best? Why?
Exercise 4: Bias-Variance Analysis
- Train a Decision Tree with max_depth = 2 (high bias) and max_depth = 20 (high variance)
- Plot training and validation accuracy vs max_depth
- Find the optimal depth
Reflection Questions
- When would you choose unsupervised over supervised learning?
- Why might a simpler model be preferred over a complex one?
- What are the ethical considerations when deploying ML models?