K-Nearest Neighbors: Distance Metrics & Curse of Dimensionality
The simplest yet most insightful instance-based learning algorithm
Interview Question
"Explain the KNN algorithm and its decision boundary. What is the curse of dimensionality and how does it affect KNN? Compare different distance metrics and their use cases."
Difficulty: Medium | Frequently asked at Amazon, Stripe, Google
Theoretical Foundation
KNN Algorithm
K-Nearest Neighbors is a lazy learning algorithm that makes predictions based on the majority class (classification) or average value (regression) of the closest training examples.
Algorithm:
- Choose the number of neighbors
- Calculate the distance between the query point and all training points
- Select the nearest neighbors
- For classification: Return the majority class
- For regression: Return the average value
Key Properties:
- Instance-based: No explicit training phase
- Non-parametric: No assumptions about data distribution
- Lazy learning: All computation happens at prediction time
- Local decision: Decisions based on local neighborhoods
Decision Boundary
The decision boundary of KNN is piecewise linear and depends on:
- K value: Larger K β smoother boundary
- Distance metric: Different metrics β different boundaries
- Feature scaling: Essential for meaningful distances
Choosing K
- Small K (e.g., K=1): Low bias, high variance (overfitting)
- Large K (e.g., K=n): High bias, low variance (underfitting)
- Optimal K: Typically or determined by cross-validation
βΉοΈ
Key Insight: KNN is a "voting" classifier. With K=1, it memorizes the training data (overfitting). With K=n, it always predicts the majority class (underfitting). The optimal K balances bias and variance.
Distance Metrics
1. Euclidean Distance (L2)
- Most commonly used
- Sensitive to outliers
- Assumes features are on the same scale
- Best for continuous features
2. Manhattan Distance (L1)
- More robust to outliers than Euclidean
- Used when features are sparse
- Best for high-dimensional data
3. Minkowski Distance (Generalized)
- : Manhattan distance
- : Euclidean distance
- : Chebyshev distance
4. Cosine Similarity
- Measures angle between vectors
- Ignores magnitude, only direction
- Best for text classification (TF-IDF)
- Range: (1 = identical direction)
5. Hamming Distance
- For categorical features
- Measures proportion of differing features
- Best for binary/categorical data
Distance Metric Comparison
| Metric | Range | Sensitive to Outliers | Best For | Time Complexity |
|---|---|---|---|---|
| Euclidean | Yes | Continuous, low-dim | ||
| Manhattan | Less | Sparse, high-dim | ||
| Cosine | No | Text, high-dim | ||
| Hamming | No | Categorical |
Curse of Dimensionality
As the number of features increases, several problems arise:
- Distance Concentration: All points become equidistant
-
Data Sparsity: Points become increasingly isolated
- Required data grows exponentially with dimensions
- Volume of unit hypersphere decreases relative to hypercube
-
Spurious Correlations: In high dimensions, random correlations appear
β οΈ
Critical Concept: In high dimensions, the concept of "nearest neighbor" becomes meaningless because all points are approximately equidistant. This is why KNN performs poorly without dimensionality reduction.
Mathematical Explanation:
For -dimensional uniform data in :
- Expected distance between two random points:
- Ratio of max to min distance:
This ratio doesn't depend on , meaning even with infinite data, the curse persists.
Optimizations for KNN
1. KD-Tree
- Binary tree partitioning by feature values
- Query time: average, worst case
- Best for low to medium dimensions ()
2. Ball Tree
- Hierarchical clustering of data points
- Better for high dimensions than KD-Tree
- Handles non-Euclidean distances
3. Approximate Nearest Neighbors (ANN)
- Locality-Sensitive Hashing (LSH)
- Annoy (Approximate Nearest Neighbors Oh Yeah)
- HNSW (Hierarchical Navigable Small World)
- Trade accuracy for speed
π‘
Production Tip: For large datasets (>100K samples), use approximate nearest neighbor libraries like Faiss (Facebook) or Annoy (Spotify) instead of exact KNN.
Weighted KNN
Instead of equal votes, weight neighbors by distance:
where or
Weighted KNN gives more influence to closer neighbors and often outperforms unweighted KNN.
Code Implementation
Explanation of Code
-
Effect of K: Shows how increasing K smooths the decision boundary and affects bias-variance tradeoff.
-
Distance Metrics: Compares Euclidean, Manhattan, Chebyshev, and Cosine distances.
-
Feature Scaling: Demonstrates that KNN is highly sensitive to feature scales.
-
Curse of Dimensionality: Shows how accuracy degrades as dimensions increase.
-
KNN Regression: Demonstrates KNN for regression tasks.
-
Weighted KNN: Compares uniform vs distance-weighted voting.
-
Algorithm Comparison: Benchmarks different KNN implementations.
Real-World Applications
Amazon: Recommendation Systems
Amazon uses KNN for:
- Product Recommendations: Finding similar products based on purchase history
- Collaborative Filtering: User-based and item-based KNN
- Content-Based Filtering: Similar item features
Stripe: Fraud Detection
Stripe uses KNN variants for:
- Anomaly Detection: Identifying unusual transaction patterns
- Customer Segmentation: Grouping similar merchants
- Risk Scoring: Similarity to known fraud cases
π‘
Amazon Interview Tip: Be prepared to discuss the tradeoff between KNN and model-based methods. KNN is interpretable but slow; model-based methods are faster but less interpretable.
Common Follow-Up Questions
Q1: Why is feature scaling essential for KNN?
KNN uses distance metrics, which are sensitive to feature scales. Without scaling, features with larger ranges dominate the distance calculation. For example, income (in thousands) would dominate age (in years).
Q2: How do you handle categorical features in KNN?
- Use Hamming distance for binary/categorical features
- One-hot encode categorical features (increases dimensionality)
- Use Gower distance (handles mixed feature types)
Q3: What is the computational complexity of KNN?
- Training: (lazy learning)
- Prediction: for brute force, for KD-Tree
- This makes KNN slow for large datasets
Q4: How do you choose the optimal K?
- Use cross-validation
- Start with
- Odd K avoids ties in binary classification
- Larger K reduces variance but increases bias
Company-Specific Tips
Amazon Interview Tips
- Discuss scalability challenges with large datasets
- Be ready to explain approximate nearest neighbors
- Mention hybrid approaches (KNN + deep learning)
- Talk about real-time serving requirements
Stripe Interview Tips
- Focus on anomaly detection applications
- Discuss interpretability for fraud explanations
- Be prepared to explain feature engineering for transactions
- Mention online learning for concept drift