Why It Matters
βΉοΈ Why It Matters
Most real-world matrices are overwhelmingly sparse β containing 90% or more zeros. Consider a movie recommendation system with 1 million users and 10,000 movies: the user-item matrix has 10 billion entries, but each user rates fewer than 100 movies, making 99.99% of entries zero. Storing this densely wastes 99.99% of memory. Sparse representations store only non-zero values, reducing storage from 80 GB to 8 MB β a 10,000x compression that makes computation feasible.
In machine learning, sparsity appears everywhere: NLP term-document matrices, graph adjacency matrices, attention masks in Transformers, and feature representations in high-dimensional data. Without sparse matrix techniques, training large language models, building recommendation engines, and analyzing social networks would be computationally impossible.
What is a Sparse Matrix
DfSparse Matrix
A matrix is sparse if most of its elements are zero. Formally, define the sparsity as:
A matrix is considered sparse when (90% zeros). The density is .
The nnz (number of non-zeros) satisfies .
πExample: Dense vs Sparse
Dense matrix (4Γ4, 50% sparse):
β β
β 1.0 0.0 0.0 2.0 β
β 0.0 3.0 0.0 0.0 β
β 0.0 0.0 4.0 0.0 β
β 5.0 0.0 0.0 6.0 β
β β
Dense storage: 16 values Γ 8 bytes = 128 bytes
Sparse matrix (4Γ4, 75% sparse, 4 non-zeros):
β β
β 1.0 0.0 0.0 0.0 β
β 0.0 3.0 0.0 0.0 β
β 0.0 0.0 4.0 0.0 β
β 0.0 0.0 0.0 6.0 β
β β
Dense storage: 16 values Γ 8 bytes = 128 bytes
Sparse COO: 4 triples Γ 16 bytes = 64 bytes
Extreme sparsity (10000Γ10000, 99.9% sparse):
- Dense: 800 MB
- Sparse (COO): ~2.4 MB (375x savings)
Sparsity Pattern
The sparsity pattern describes which entries are non-zero. Visualizing sparsity patterns reveals structure that affects format choice and algorithm performance.
βΉοΈ Visualizing Sparsity
Dense (no zeros): Sparse (structured): Sparse (random):
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β β β β β β β β β β β β β β β β β β β β β β β β
β = non-zero β = non-zero β = non-zero
β = zero β = zero β = zero
- Dense: All entries stored β no pattern to exploit
- Structured sparse: Banded, diagonal, or block patterns β can use specialized algorithms
- Random sparse: Non-zeros scattered unpredictably β general sparse formats needed
ThSparsity Impact on Storage
For an matrix with non-zeros:
| Storage | Dense | CSR/CSC | COO |
|---|---|---|---|
| Formula | bytes | bytes | bytes |
| Break-even | β |
When , sparse formats provide exponential savings.
Sparse Matrix Formats
CSR (Compressed Sparse Row)
DfCSR Format
CSR stores a sparse matrix using three arrays:
- values: Non-zero values in row-major order
- col_indices: Column index for each non-zero value
- row_ptr: Index into values/col_indices where each row starts (length )
Storage: bytes (values) + bytes (col_indices) + bytes (row_ptr)
Matrix: CSR Representation:
β β values = [1, 2, 3, 4, 5, 6]
β 1 0 0 2 0 β col_idx = [0, 3, 1, 2, 3, 4]
β 0 3 0 0 0 β row_ptr = [0, 2, 3, 5, 6]
β 0 0 4 5 0 β
β 0 0 0 0 6 β Row 0: values[0:2] = [1, 2]
β β Row 1: values[2:3] = [3]
Row 2: values[3:5] = [4, 5]
Row 3: values[5:6] = [6]
Best for: Row slicing, matrix-vector multiplication, arithmetic operations Worst for: Column slicing, inserting new non-zeros
CSC (Compressed Sparse Column)
DfCSC Format
CSC is the column-compressed analog of CSR:
- values: Non-zero values in column-major order
- row_indices: Row index for each non-zero value
- col_ptr: Index into values/row_indices where each column starts (length )
Matrix: CSC Representation:
β β values = [1, 3, 2, 4, 5, 6]
β 1 0 0 2 0 β row_idx = [0, 1, 0, 2, 2, 3]
β 0 3 0 0 0 β col_ptr = [0, 2, 3, 4, 6, 6]
β 0 0 4 5 0 β
β 0 0 0 0 6 β Col 0: values[0:2] = [1, 3] (rows 0,1)
β β Col 1: values[2:3] = [3] (row 1)
Col 2: values[3:4] = [4] (row 2)
Col 3: values[4:6] = [2, 5] (rows 0,2)
Col 4: values[6:6] = [] (no non-zeros)
Best for: Column slicing, solving , feature-based operations Worst for: Row slicing, inserting new non-zeros
COO (Coordinate)
DfCOO Format
COO stores each non-zero as a triple:
- row: Row indices of non-zeros
- col: Column indices of non-zeros
- values: Non-zero values
All three arrays have length . Order doesn't matter but sorting by row then column is conventional.
Matrix: COO Representation:
β β row = [0, 0, 1, 2, 2, 3]
β 1 0 0 2 0 β col = [0, 3, 1, 2, 3, 4]
β 0 3 0 0 0 β values = [1, 2, 3, 4, 5, 6]
β 0 0 4 5 0 β
β 0 0 0 0 6 β Each triple: (row[i], col[i], values[i])
β β = (0,0,1), (0,3,2), (1,1,3), ...
Best for: Construction, incremental building, format conversion input Worst for: Arithmetic, slicing (convert to CSR/CSC first)
LIL (List of Lists)
DfLIL Format
LIL stores two lists of lists:
- data:
data[i]is a list of non-zero values in row - rows:
rows[i]is a list of column indices for row
Allows efficient row-wise insertion and deletion. Each row can have variable-length lists.
Matrix: LIL Representation:
β β data = [[1, 2], [3], [4, 5], [6]]
β 1 0 0 2 0 β rows = [[0, 3], [1], [2, 3], [4]]
β 0 3 0 0 0 β
β 0 0 4 5 0 β Row 0: data[0]=[1,2], rows[0]=[0,3]
β 0 0 0 0 6 β Row 1: data[1]=[3], rows[1]=[1]
β β Row 2: data[2]=[4,5], rows[2]=[2,3]
Row 3: data[3]=[6], rows[3]=[4]
Best for: Incremental construction, row-wise modifications Worst for: Arithmetic, column slicing (convert to CSR/CSC)
DOK (Dictionary of Keys)
DfDOK Format
DOK uses a Python dictionary mapping (row, col) tuples to values. Only non-zero entries are stored. Supports O(1) lookup and insertion for individual elements.
Matrix: DOK Representation:
β β {(0,0): 1, (0,3): 2,
β 1 0 0 2 0 β (1,1): 3, (2,2): 4,
β 0 3 0 0 0 β (2,3): 5, (3,4): 6}
β 0 0 4 5 0 β
β 0 0 0 0 6 β Lookup: dok[(0,3)] β 2
β β Insert: dok[(4,4)] = 7
Best for: Element-wise access during construction, sparse matrix assembly Worst for: Arithmetic, slicing (convert to CSR/CSC)
Format Comparison
| Format | Construction | Row Slice | Col Slice | MatVec | Arithmetic | Memory |
|---|---|---|---|---|---|---|
| CSR | O(nnz) | O(1) | O(m) | O(nnz) | O(nnz) | Low |
| CSC | O(nnz) | O(n) | O(1) | O(nnz) | O(nnz) | Low |
| COO | O(1) | O(nnz) | O(nnz) | O(nnz) | O(nnz) | Medium |
| LIL | O(1) | O(nnz) | O(nnz) | O(nnz) | O(nnz) | High |
| DOK | O(1) | O(nnz) | O(nnz) | O(nnz) | O(nnz) | High |
Format Conversions
Converting between formats involves tradeoffs in speed, memory, and operations supported.
βΉοΈ Conversion Rules
- Any β CSR/CSC: Fast, O(nnz). This is the standard conversion path.
- CSR β CSC: Fast, O(nnz). Just transpose the arrays.
- Dense β CSR: Use
scipy.sparse.csr_matrix(dense_array). - CSR/CSC β Dense: Use
.toarray()β only for small matrices. - Any β LIL/DOK: Use for construction, then convert to CSR/CSC for computation.
from scipy import sparse
import numpy as np
# Build incrementally with LIL, convert to CSR for computation
lil = sparse.lil_matrix((10000, 10000))
lil[0, 0] = 1
lil[1, 2] = 3
lil[5, 7] = -2
csr = lil.tocsr() # Convert for efficient operations
# COO construction from data
row = np.array([0, 1, 2, 0])
col = np.array([0, 1, 2, 2])
val = np.array([1.0, 2.0, 3.0, 4.0])
coo = sparse.coo_matrix((val, (row, col)), shape=(3, 3))
csr = coo.tocsr() # Convert for arithmetic
# CSR β CSC conversion
csc = csr.tocsc()
csr_back = csc.tocsr()
Conversion cost hierarchy:
LIL/DOK β CSR/CSC: O(nnz Γ log(nnz)) [sorting]
COO β CSR/CSC: O(nnz) [linear scan]
CSR β CSC: O(nnz) [transpose]
CSR/CSC β Dense: O(m Γ n) [always expensive]
Sparse Matrix Operations
Matrix-Vector Multiplication
ThSparse MatVec Complexity
For CSR matrix (, nnz non-zeros) and dense vector :
Each row computes:
This is optimal β you cannot multiply faster than the number of non-zeros.
from scipy import sparse
import numpy as np
A = sparse.random(100000, 100000, density=0.001, format='csr')
x = np.random.randn(100000)
y = A @ x # O(nnz) matrix-vector multiply
print(f"nnz: {A.nnz}, time proportional to {A.nnz} operations")
Matrix Addition
A = sparse.random(1000, 1000, density=0.01, format='csr')
B = sparse.random(1000, 1000, density=0.01, format='csr')
C = A + B # Element-wise addition, sparsity decreases
D = A - B # Element-wise subtraction
E = 2.0 * A # Scalar multiplication
Note: Sum of two sparse matrices has sparsity min(sparsity(A), sparsity(B)). Non-zeros may increase significantly.
Matrix Multiplication
βΉοΈ Sparse Γ Sparse Multiplication
Matrix multiplication is more complex. The result requires non-zeros to "align" on shared indices. The output nnz is unpredictable and can range from 0 to .
CSR Γ CSR is efficient when the inner dimension has limited non-zeros per column/row.
A = sparse.random(1000, 1000, density=0.01, format='csr')
B = sparse.random(1000, 1000, density=0.01, format='csr')
C = A @ B # Sparse Γ Sparse
print(f"Input nnz: {A.nnz}, {B.nnz}, Output nnz: {C.nnz}")
Transpose
A = sparse.random(1000, 500, density=0.01, format='csr')
AT = A.T # O(nnz) operation, swaps CSR β CSC internally
AAT = A @ A.T # Symmetric result
Element Access
# DOK: O(1) lookup
dok = sparse.dok_matrix((10000, 10000))
dok[123, 456] = 7.0
val = dok[123, 456] # O(1)
# CSR: O(log(nnz_row)) via binary search within row
csr = dok.tocsr()
val = csr[123, 456] # Slower than DOK for single lookups
# Dense slice: convert small block
block = csr[100:200, 100:200].toarray()
Sparse vs Dense
ThWhen to Use Sparse vs Dense
| Criterion | Use Sparse | Use Dense |
|---|---|---|
| Sparsity | β₯ 90% zeros | < 90% zeros |
| Matrix size | > 1000 Γ 1000 | < 1000 Γ 1000 |
| Operations | MatVec, slicing | Element-wise, broadcast |
| Hardware | CPU (irregular access) | GPU (regular access) |
| Libraries | scipy.sparse, PyTorch sparse | NumPy, cuBLAS |
πPerformance Comparison
import time
from scipy import sparse
import numpy as np
n = 10000
A_dense = np.random.randn(n, n)
A_dense[A_dense < 0.9] = 0 # Make ~90% zeros
A_sparse = sparse.csr_matrix(A_dense)
x = np.random.randn(n)
# Dense: O(nΒ²) time
t0 = time.time()
y_dense = A_dense @ x
t_dense = time.time() - t0
# Sparse: O(nnz) time
t0 = time.time()
y_sparse = A_sparse @ x
t_sparse = time.time() - t0
print(f"Dense: {t_dense:.4f}s, Sparse: {t_sparse:.4f}s")
print(f"Speedup: {t_dense/t_sparse:.1f}x")
# Typical result: 5-20x speedup for 90% sparse matrices
Key insight: GPUs prefer dense matrices for regular memory access patterns. Sparse matrices on GPUs (cuSPARSE) can be faster but require specialized kernels. For very large sparse matrices, CPU sparse operations often beat GPU dense operations due to memory constraints.
Common Sparsity Patterns
DfDiagonal Matrix
A diagonal matrix has for . Stored as a single vector of length . Operations are instead of .
D = sparse.diags([1, 2, 3, 4]) # 4x4 diagonal
D_inv = sparse.diags([1/x for x in [1, 2, 3, 4]]) # O(n) inversion
DfBanded Matrix
A banded matrix has non-zeros only within diagonals of the main diagonal. Bandwidth determines storage: instead of .
# Tridiagonal matrix (bandwidth = 1)
main_diag = np.ones(100) * 2
off_diag = np.ones(99) * -1
A = sparse.diags([off_diag, main_diag, off_diag], [-1, 0, 1])
# Storage: 299 values vs 10000 dense
DfBlock Diagonal Matrix
A block diagonal matrix has non-zero sub-matrices only along the diagonal. Each block is independent.
blocks = [sparse.random(100, 100, density=0.1) for _ in range(10)]
A = sparse.block_diag(blocks) # 1000x1000 matrix, 10 independent blocks
DfRandom Sparse Matrix
Non-zeros are placed randomly. The density parameter controls the fraction of non-zeros.
A = sparse.random(1000, 1000, density=0.01, format='csr')
# Expected nnz = 1000 * 1000 * 0.01 = 10000
Python Implementation
Construction Patterns
from scipy import sparse
import numpy as np
# Pattern 1: From dense array
dense = np.array([[1, 0, 2], [0, 0, 0], [3, 0, 4]])
csr = sparse.csr_matrix(dense)
# Pattern 2: COO from data
row, col, val = [0, 2, 0], [0, 2, 2], [1.0, 4.0, 2.0]
coo = sparse.coo_matrix((val, (row, col)), shape=(3, 3))
# Pattern 3: Diagonal
D = sparse.diags([1, 2, 3])
# Pattern 4: Random
R = sparse.random(100, 100, density=0.1, format='csr',
random_state=42)
# Pattern 5: Identity
I = sparse.eye(1000, format='csr')
# Pattern 6: Kronecker product
A = sparse.random(10, 10, density=0.3)
B = sparse.random(10, 10, density=0.3)
C = sparse.kron(A, B) # 100x100 sparse matrix
Sparse Linear Algebra
from scipy.sparse import linalg
A = sparse.csr_matrix([[4, 1], [1, 3]])
b = np.array([1, 2])
# Direct solve (small sparse systems)
x = linalg.spsolve(A, b)
# Iterative solve (large sparse systems)
x_iter, info = linalg.bicgstab(A, b, maxiter=1000, tol=1e-10)
# Eigenvalues
vals, vecs = linalg.eigs(A, k=2, which='LM')
# Singular values
U, s, Vt = linalg.svds(A, k=2)
# Norm
norm_A = linalg.norm(A, ord='fro')
Practical Example: Sparse Gradient Descent
from scipy import sparse
import numpy as np
# Feature matrix: 1M samples, 100K features, 0.1% density
X = sparse.random(1000000, 100000, density=0.001, format='csr')
y = np.random.randn(1000000)
w = np.random.randn(100000)
# Gradient: X.T @ (X @ w - y)
residual = X @ w - y # O(nnz) sparse matvec
gradient = X.T @ residual # O(nnz) sparse transpose matvec
w -= 0.01 * gradient # O(n) dense update
# Memory: X is ~1.2GB sparse, would be 80GB dense
Applications in AI/ML
Recommendation Systems
βΉοΈ Collaborative Filtering
User-item interaction matrices are extremely sparse. Netflix has ~200M users Γ ~20K items, but each user rates <100 items (99.95% sparse). Matrix factorization decomposes where , with . Sparse formats enable training on matrices with billions of entries.
from scipy.sparse.linalg import svds
# Sparse user-item matrix (1M users, 10K items)
R = sparse.random(1000000, 10000, density=0.001, format='csr')
# Truncated SVD for latent factors
U, sigma, Vt = svds(R, k=50) # k=50 latent dimensions
# U: user embeddings, Vt: item embeddings
NLP: Term-Document Matrices
βΉοΈ Bag of Words
Document-term matrices are sparse because most documents contain a small vocabulary subset. A 100K-document corpus with 50K vocabulary has a 5B-entry matrix, but each document uses ~200 terms (99.996% sparse). TF-IDF weighting preserves sparsity while improving feature quality.
from sklearn.feature_extraction.text import TfidfVectorizer
documents = ["the cat sat on the mat", "the dog ran in the park"]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents) # Sparse matrix
print(f"Shape: {X.shape}, nnz: {X.nnz}")
Graph Neural Networks
βΉοΈ Adjacency Matrices
Graph adjacency matrices are sparse by definition. A social network with 1B users has a 10^18-entry adjacency matrix, but each user has ~100 friends (99.99999% sparse). GNNs perform sparse message passing: , where is the sparse adjacency matrix.
import torch
from torch_geometric.utils import dense_to_sparse
# Dense adjacency (small graph)
A_dense = torch.tensor([[0, 1, 0], [1, 0, 1], [0, 1, 0]], dtype=torch.float)
edge_index, edge_weight = dense_to_sparse(A_dense)
# edge_index: [[0,1,1,2], [1,0,2,1]] β sparse COO-like format
Transformer Sparse Attention
βΉοΈ Sparse Attention Patterns
Standard Transformer attention is for sequence length . Sparse attention restricts each token to attend to or others, reducing complexity to or . Patterns include local (window), global (anchor tokens), and random connections.
# Local attention pattern (window size = 256)
def local_attention_mask(seq_len, window=256):
mask = torch.zeros(seq_len, seq_len, dtype=torch.bool)
for i in range(seq_len):
start = max(0, i - window // 2)
end = min(seq_len, i + window // 2)
mask[i, start:end] = True
return mask # ~99% sparse for long sequences
# Sparse attention in PyTorch
from torch.nn.functional import scaled_dot_product_attention
# Use is_causal or custom attn_mask for sparsity
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
Using .toarray() for large matrices | Defeats purpose of sparsity β allocates O(mΓn) memory | Keep as sparse, use sparse operations |
| Converting CSR β CSC repeatedly | Each conversion is O(nnz) and creates new arrays | Pick the right format once, stick with it |
| Adding dense to sparse | Result is dense, losing all benefits | Use sparse + sparse, or convert sparse to dense only when necessary |
Using A[i,j] in loops on CSR | Each access is O(log(nnz_row)) β slow in loops | Use DOK for random access, vectorize operations |
| Ignoring fill-in in sparse solvers | Matrix multiplication can create new non-zeros | Monitor nnz growth, use iterative methods for large systems |
| Assuming GPU acceleration helps | Sparse operations on GPUs have high overhead for small nnz | Profile: GPU sparse beats CPU sparse only for very large matrices |
| Using float64 for integer indices | Wastes memory, slower operations | Use int32 for indices (saves 50% over int64) |
| Building large sparse matrices with append | List appends cause O(n) copies | Use COO lists or LIL, convert once at the end |
Interview Questions
Q1: Why are sparse matrices important in machine learning?
Answer: Most real-world datasets produce sparse matrices: NLP term-document matrices (99.9% sparse), recommendation systems (99.95% sparse), graph adjacency matrices (99.999% sparse). Storing and computing with dense representations wastes memory and compute. Sparse formats store only non-zeros, reducing memory from O(mn) to O(nnz) and computation from O(mn) to O(nnz).
Q2: When would you choose CSR over CSC?
Answer: CSR for row-oriented operations: row slicing, matrix-vector multiplication (), row-wise normalization. CSC for column-oriented operations: column slicing, solving , feature normalization. CSR and CSC are dual β CSR is optimal when operations access rows, CSC when accessing columns.
Q3: What is the time complexity of sparse matrix-vector multiplication?
Answer: O(nnz), where nnz is the number of non-zero entries. For CSR, each row computes by iterating over non-zeros in that row. Total work is proportional to nnz, which is optimal since every non-zero must be accessed at least once.
Q4: How does sparse matrix multiplication differ from dense?
Answer: Dense multiplication is . Sparse multiplication is more complex: the output nnz is unpredictable. For CSR Γ CSR, the algorithm iterates over non-zeros of A and B, finding shared indices via merge or hash. Complexity is worst case. The output can be much denser than either input ("fill-in").
Q5: Describe the conversion path from construction to computation.
Answer: Build with COO (append triples in any order) or LIL (row-wise insertion), then convert to CSR or CSC for computation. Conversion is O(nnz). Never build directly in CSR/CSC β they don't support efficient insertion. The pipeline: COO/LIL β CSR/CSC β compute β COO/LIL for modifications.
Q6: What is fill-in and why does it matter?
Answer: Fill-in occurs when operations on sparse matrices produce denser results. Examples: doubles non-zeros; can create non-zeros; sparse LU decomposition creates non-zeros in zero positions. Fill-in reduces the effectiveness of sparse representations and increases computation time.
Q7: How would you handle a sparse matrix on a GPU?
Answer: Use cuSPARSE (NVIDIA) or equivalent. Key considerations: (1) Only beneficial for very large matrices (nnz > 1M), (2) CSR/CSC formats are supported, (3) Sparse operations on GPUs have higher per-operation overhead than CPU, so batch operations help, (4) Mixed sparse-dense operations (sparse Γ dense) benefit most from GPU acceleration.
Practice Problems
Problem 1: Memory Savings
πProblem
A 50000Γ50000 adjacency matrix represents a social network where each user has exactly 200 friends on average. Compare storage requirements for dense (float64), COO, and CSR formats.
π‘Solution
- Dense: GB
- nnz: (10 million non-zeros)
- COO: MB (value + row idx + col idx, int32 indices)
- CSR: MB (values + col_idx + row_ptr)
- Savings: CSR is 167x smaller than dense
Problem 2: Sparsity After Operations
πProblem
Matrix has sparsity 95% (5% non-zeros). Matrix also has sparsity 95%. What is the maximum possible sparsity of ? What is the minimum?
π‘Solution
- Maximum sparsity: 90% β if A and B have completely disjoint non-zero positions, the sum has non-zeros (90% sparse)
- Minimum sparsity: 95% β if A and B have identical non-zero positions, the sum has at most 5% non-zeros (some may cancel to zero, increasing sparsity)
- General rule:
- In practice, sparsity typically decreases by 10-30% after addition
Problem 3: Format Selection
πProblem
You need to: (1) build a sparse matrix by reading (row, col, value) triples from a file, (2) perform 1000 matrix-vector products, (3) occasionally access individual elements. Which format(s) should you use and in what order?
π‘Solution
- Build: Use COO β read triples directly, no sorting needed. O(1) per insertion.
- Convert:
.tocsr()β O(nnz) conversion - MatVec: Use CSR β O(nnz) per multiply, optimal for row-oriented operations
- Element access: Keep CSR but accept O(log(nnz_row)) per access, or maintain a small DOK copy for frequent lookups
- Overall: COO β CSR, with optional DOK for random access
Problem 4: Sparse Eigenvalues
πProblem
You have a sparse matrix (, nnz = 50000) and need the 5 largest eigenvalues. Why is scipy.sparse.linalg.eigs(A, k=5) preferred over numpy.linalg.eig(A.toarray())?
π‘Solution
- Dense eigendecomposition: operations, MB storage
- Sparse eigs (Lanczos/Arnoldi): operations per iteration, KB storage
- Speedup: ~1000000x faster, ~2000x less memory
- Key insight: Sparse eigensolvers only compute the eigenvalues you need, using matrix-vector products (O(nnz)) instead of full decomposition
Quick Reference
| Aspect | Detail |
|---|---|
| Dense storage | bytes (float64) |
| CSR storage | bytes |
| CSC storage | bytes |
| COO storage | bytes |
| MatVec complexity | |
| Sparse Γ Sparse | |
| Sparsity definition | |
| Sparse threshold | β₯ 90% zeros |
| Break-even nnz | for CSR vs dense |
| Best construction format | COO (random) or LIL (row-wise) |
| Best computation format | CSR (rows) or CSC (columns) |
| Best random access format | DOK |
| Python library | scipy.sparse |
| GPU support | cuSPARSE, PyTorch sparse |
Cross-References
- Matrix Decompositions β SVD, eigenvalue decomposition for sparse matrices
- Linear Algebra Matrices β Dense operations as baseline for comparison
- Linear Algebra Basics β Vectors, matrices, and basic operations
- Linear Algebra Determinants β Rank, determinant, conditioning
- Eigenvalues and Eigenvectors β Sparse eigenvalue computation
- Calculus Optimization β Gradient methods using sparse Jacobians
- Probability Distributions β Sparse covariance matrices
- Discrete Graphs β Adjacency matrices and sparse graph operations