NVIDIA & Meta Interview

Dimensionality Reduction: t-SNE, UMAP & Autoencoders

Visualizing and compressing high-dimensional data

Interview Question

"Explain the differences between t-SNE, UMAP, and PCA for dimensionality reduction. When would you use each method? How do autoencoders compare to these methods?"

Difficulty: Hard | Frequently asked at NVIDIA, Meta, Google

Theoretical Foundation

PCA Recap (Linear)

PCA finds orthogonal directions of maximum variance. Fast and deterministic, but only captures linear relationships.

t-SNE (t-Distributed Stochastic Neighbor Embedding)

Goal: Preserve local structure (nearby points stay nearby).

Algorithm:

Compute pairwise similarities in high-dimensional space:

p_{j|i} = \frac{\exp(-\|x_i - x_j\|^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\|x_i - x_k\|^2 / 2\sigma_i^2)}

Compute similarities in low-dimensional space using t-distribution:

q_{ij} = \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}

Minimize KL divergence:

KL(P \| Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}

Properties:

Preserves local structure well
Non-parametric (no explicit mapping)
Slow for large datasets ( $O(n^2)$ )
Non-deterministic (different runs give different results)
Good for visualization (2-3D)

UMAP (Uniform Manifold Approximation and Projection)

Goal: Preserve both local and global structure.

Algorithm:

Build fuzzy simplicial set (neighborhood graph) in high-D
Optimize low-D embedding to match the fuzzy set

Properties:

Faster than t-SNE ( $O(n^{1.14})$ )
Better preserves global structure
Can be used for general dimensionality reduction
Has a parametric version
Deterministic with fixed random seed

t-SNE vs UMAP Comparison

Aspect	t-SNE	UMAP
Speed	Slow	Fast
Global Structure	Poor	Good
Scalability	Limited	Better
Deterministic	No	Yes (with seed)
Out-of-sample	No	Yes (parametric)
Use Case	Visualization only	Visualization + reduction

ℹ️

Key Insight: UMAP is generally preferred over t-SNE because it's faster, preserves global structure better, and can handle out-of-sample points. t-SNE is still useful for quick visualization.

Autoencoders

Architecture:

Encoder: Maps input to latent space $z = f(x)$
Decoder: Reconstructs input from latent space $\hat{x} = g(z)$

Training:

\min \|x - g(f(x))\|^2

Variants:

Variational Autoencoder (VAE): Adds probabilistic structure to latent space
Denoising Autoencoder: Trained to reconstruct from corrupted input
Sparse Autoencoder: Enforces sparsity in latent representation

Properties:

Learns non-linear mappings
Can generate new data (VAE)
Requires more data and tuning
Computationally expensive

Method Selection Guide

Method	Speed	Non-linear	Generation	Out-of-sample	Best For
PCA	Fast	No	No	Yes	Linear data, feature reduction
t-SNE	Slow	Yes	No	No	2D/3D visualization
UMAP	Medium	Yes	No	Yes	Visualization + reduction
Autoencoder	Slow	Yes	Yes	Yes	Complex non-linear patterns

Code Implementation

Real-World Applications

NVIDIA: Feature Visualization

CNN Feature Maps: Visualizing learned representations
Latent Space Analysis: Understanding generative models
GPU-Accelerated Reduction: CUDA implementations of t-SNE/UMAP

Meta: Content Understanding

Embedding Visualization: Visualizing user/item embeddings
Topic Modeling: Reducing document dimensions
Anomaly Detection: Identifying outliers in embedding space

💡

NVIDIA Interview Tip: Discuss GPU acceleration of t-SNE and UMAP. Mention libraries like FIt-SNE and cuML that provide GPU-accelerated implementations.

Common Follow-Up Questions

Q1: Why is t-SNE not suitable for high-dimensional data directly? t-SNE is slow ( $O(n^2)$ ) and struggles with high dimensions. Apply PCA first to reduce to ~50 dimensions, then apply t-SNE.

Q2: How does UMAP preserve global structure better than t-SNE? UMAP uses fuzzy topological representations and optimizes a cross-entropy loss, while t-SNE uses KL divergence which emphasizes local structure.

Q3: When would you use an autoencoder over t-SNE/UMAP? When you need: (1) a parametric mapping for out-of-sample points, (2) data generation capabilities (VAE), or (3) complex non-linear patterns.

Q4: How do you evaluate dimensionality reduction quality?

Trustworthiness: Are neighbors preserved?
Continuity: Is the mapping smooth?
Downstream task performance: Does reduction help classification/clustering?

Dimensionality Reduction: t-SNE, UMAP & Autoencoders

Dimensionality Reduction: t-SNE, UMAP & Autoencoders

Interview Question

Theoretical Foundation

PCA Recap (Linear)

t-SNE (t-Distributed Stochastic Neighbor Embedding)

UMAP (Uniform Manifold Approximation and Projection)

t-SNE vs UMAP Comparison

Autoencoders

Method Selection Guide

Code Implementation

Real-World Applications

NVIDIA: Feature Visualization

Meta: Content Understanding

Common Follow-Up Questions

Related Topics