← Math|100 of 100
Advanced Topics

Topological Data Analysis

Understand persistent homology, barcodes, and their applications in shape analysis and ML.

📂 TDA📖 Lesson 100 of 100🎓 Free Course

Advertisement

Topological Data Analysis

ℹ️ Why It Matters

Topological data analysis (TDA) captures the shape of data—connected components, loops, voids, and higher-dimensional cavities—that traditional statistical methods miss. Persistent homology tracks these topological features across multiple scales, providing robust descriptors that are invariant to continuous deformations. TDA has proven valuable in shape analysis, time series classification, protein structure analysis, and as feature engineering for machine learning. The mathematical foundation draws from algebraic topology, simplicial homology, and metric geometry.


Core Definitions

DfTopological Space

A topological space is a pair (X,τ)(X, \tau) where XX is a set and τ2X\tau \subseteq 2^X is a collection of subsets (open sets) satisfying: (1) ,Xτ\emptyset, X \in \tau; (2) arbitrary unions of sets in τ\tau are in τ\tau; (3) finite intersections of sets in τ\tau are in τ\tau. A topology captures the notion of "closeness" without requiring a metric. Examples include the standard topology on R\mathbb{R} (generated by open intervals) and the discrete topology (every subset is open).

DfOpen and Closed Sets

A set UXU \subseteq X is open if UτU \in \tau. A set CXC \subseteq X is closed if XCX \setminus C is open (equivalently, CC contains all its limit points). In a metric space, open balls B(x,r)={y:d(x,y)<r}B(x, r) = \{y : d(x,y) < r\} generate the topology. A set can be both open and closed (clopen), or neither. In connected spaces, the only clopen sets are \emptyset and XX.

DfHomeomorphism

A function f:XYf: X \to Y between topological spaces is a homeomorphism if it is bijective, continuous, and has a continuous inverse. Homeomorphic spaces are topologically equivalent—they have the same "shape." For example, a coffee mug and a donut are homeomorphic (both have one hole). Homeomorphism preserves connectedness, compactness, and all topological invariants.

DfConnectedness

A topological space XX is connected if it cannot be written as the union of two disjoint non-empty open sets. Equivalently, the only clopen (simultaneously closed and open) sets are \emptyset and XX. A space is path-connected if any two points can be joined by a continuous path. Path-connected implies connected, but not conversely (the topologist's sine curve is connected but not path-connected). The number of connected components β0\beta_0 is the zeroth Betti number.

DfCompactness

A topological space XX is compact if every open cover {Uα}\{U_\alpha\} (where XUαX \subseteq \bigcup U_\alpha) has a finite subcover. In metric spaces, compactness is equivalent to sequential compactness (every sequence has a convergent subsequence) and to being complete and totally bounded. Compact sets are "finite" in a topological sense. The Heine-Borel theorem states that in Rn\mathbb{R}^n, a set is compact iff it is closed and bounded.

DfHomotopy

Two continuous maps f,g:XYf, g: X \to Y are homotopic (written fgf \simeq g) if there exists a continuous map H:X×[0,1]YH: X \times [0,1] \to Y such that H(x,0)=f(x)H(x, 0) = f(x) and H(x,1)=g(x)H(x, 1) = g(x). A space is contractible if it is homotopy equivalent to a point. Homotopy equivalence is weaker than homeomorphism—it preserves "shape" but not rigid structure. For example, a disk is contractible (homotopy equivalent to a point), while a circle is not.

DfSimplicial Complex

An abstract simplicial complex KK on a vertex set VV is a collection of finite subsets of VV (simplices) closed under taking subsets. A simplex {v0,,vk}\{v_0, \ldots, v_k\} has dimension kk. The geometric realization embeds KK in Euclidean space as a union of simplices. Examples: graph (1-complex), triangle mesh (2-complex), tetrahedralization (3-complex). The ff-vector (f0,f1,,fd)(f_0, f_1, \ldots, f_d) counts the number of simplices in each dimension.

DfSimplicial Homology

The kk-th chain group Ck(K)C_k(K) is the free abelian group generated by kk-simplices. The boundary operator k:CkCk1\partial_k: C_k \to C_{k-1} maps each kk-simplex to its oriented boundary. The kk-th homology group is Hk(K)=kerk/im k+1H_k(K) = \ker \partial_k / \text{im } \partial_{k+1}, and the kk-th Betti number is βk=rank(Hk)\beta_k = \text{rank}(H_k). Homology measures "holes" in each dimension.

DfPersistent Homology

Given a filtration of simplicial complexes =K0K1Km=K\emptyset = K_0 \subseteq K_1 \subseteq \cdots \subseteq K_m = K, persistent homology tracks the birth and death of homological features (connected components, loops, voids) as the filtration parameter increases. A feature born at scale bb and dying at scale dd has persistence dbd - b; features with high persistence are considered "real" structure, while short-lived features are attributed to noise.

DfVietoris-Rips Complex

For a point cloud XX and parameter ϵ>0\epsilon > 0, the Vietoris-Rips complex VR(X,ϵ)\text{VR}(X, \epsilon) includes a simplex {xi0,,xik}\{x_{i_0}, \ldots, x_{i_k}\} if d(xij,xil)ϵd(x_{i_j}, x_{i_l}) \leq \epsilon for all j,lj, l. As ϵ\epsilon increases, more simplices are added, creating a filtration. The Rips complex is commonly used in TDA because it depends only on pairwise distances.

DfČech Complex

For a point cloud XX and parameter ϵ>0\epsilon > 0, the Čech complex includes a simplex {xi0,,xik}\{x_{i_0}, \ldots, x_{i_k}\} if the balls of radius ϵ\epsilon centered at xi0,,xikx_{i_0}, \ldots, x_{i_k} have non-empty intersection. The Čech complex captures the "union of balls" topology and satisfies the Nerve Theorem: if the balls are convex, the Čech complex is homotopy equivalent to the union of balls.


Key Formulas

Betti Numbers

βk=rank(Hk)=dim(kerk)dim(im k+1)\beta_k = \text{rank}(H_k) = \text{dim}(\ker \partial_k) - \text{dim}(\text{im } \partial_{k+1})

Here,

  • βk\beta_k=k-th Betti number (number of k-dimensional holes)
  • HkH_k=k-th homology group
  • k\partial_k=Boundary operator from k-chains to (k-1)-chains

Euler Characteristic

χ=k=0n(1)kβk=k=0n(1)kαk\chi = \sum_{k=0}^{n} (-1)^k \beta_k = \sum_{k=0}^{n} (-1)^k \alpha_k

Here,

  • χ\chi=Euler characteristic (topological invariant)
  • βk\beta_k=k-th Betti number
  • αk\alpha_k=Number of k-simplices in a triangulation

Persistence Diagram

Dgmk={(bi,di):feature born at bi dies at di}Dgm_k = \{(b_i, d_i) : \text{feature born at } b_i \text{ dies at } d_i\}

Here,

  • DgmkDgm_k=Persistence diagram for k-dimensional features
  • bib_i=Birth time of the i-th feature
  • did_i=Death time of the i-th feature

Bottleneck Distance

dB(D1,D2)=infγsup(b,d)D1(b,d)γ(b,d)d_B(D_1, D_2) = \inf_{\gamma} \sup_{(b,d) \in D_1} \|(b,d) - \gamma(b,d)\|_\infty

Here,

  • D1,D2D_1, D_2=Two persistence diagrams
  • γ\gamma=Bijection (matching) between the diagrams
  • dBd_B=Bottleneck distance measuring diagram similarity

Wasserstein Distance

Wp(D1,D2)=(infγ(b,d)D1(b,d)γ(b,d)p)1/pW_p(D_1, D_2) = \left(\inf_{\gamma} \sum_{(b,d) \in D_1} \|(b,d) - \gamma(b,d)\|_\infty^p\right)^{1/p}

Here,

  • WpW_p=p-Wasserstein distance between persistence diagrams
  • γ\gamma=Optimal matching (bijection with possible diagonal matches)

Vietoris-Rips Distance Condition

{xi0,,xik}VR(X,ϵ)    d(xij,xil)ϵ  j,l\{x_{i_0}, \ldots, x_{i_k}\} \in \text{VR}(X, \epsilon) \iff d(x_{i_j}, x_{i_l}) \leq \epsilon \; \forall \, j, l

Here,

  • ϵ\epsilon=Distance threshold parameter
  • d(xij,xil)d(x_{i_j}, x_{i_l})=Pairwise distance between points

Čech Complex Intersection Condition

{xi0,,xik}Cˇ(X,ϵ)    j=0kB(xij,ϵ)\{x_{i_0}, \ldots, x_{i_k}\} \in \check{C}(X, \epsilon) \iff \bigcap_{j=0}^{k} B(x_{i_j}, \epsilon) \neq \emptyset

Here,

  • B(x,epsilon)B(x, epsilon)=Open ball of radius ε centered at x
  • ϵ\epsilon=Radius parameter

Important Theorems

ThNerve Theorem (Borsuk)

Let {Uα}\{U_\alpha\} be a finite collection of convex, contractible subsets of Rn\mathbb{R}^n such that all non-empty intersections Uα0UαkU_{\alpha_0} \cap \cdots \cap U_{\alpha_k} are also contractible. Then the nerve of the cover is homotopy equivalent to Uα\bigcup U_\alpha. This theorem justifies using simplicial complexes (built from intersections of balls) to capture the topology of the underlying space.

ThStability of Persistence Diagrams

If f,g:XRf, g: X \to \mathbb{R} are continuous functions on a compact space, then dB(pers(f),pers(g))fgd_B(\text{pers}(f), \text{pers}(g)) \leq \|f - g\|_\infty, where dBd_B is the bottleneck distance and pers(f)\text{pers}(f) is the persistence diagram of ff. This means small perturbations of the data cause small changes in the persistence diagram—TDA is stable.

ThIsometry Invariance of Persistent Homology

Persistent homology is invariant under isometries of the metric space. If (X,dX)(X, d_X) and (Y,dY)(Y, d_Y) are isometric metric spaces, their persistence diagrams are identical. This means TDA captures intrinsic geometric structure independent of embedding.

ThUniversal Coefficient Theorem

For a topological space XX with simplicial homology Hk(X;Z)H_k(X; \mathbb{Z}), the homology with coefficients in a field FF satisfies:

Hk(X;F)Hk(X;Z)FTor(Hk1(X;Z),F)H_k(X; F) \cong H_k(X; \mathbb{Z}) \otimes F \oplus \text{Tor}(H_{k-1}(X; \mathbb{Z}), F)

When FF has characteristic 0 (e.g., Q\mathbb{Q}, R\mathbb{R}), the Tor term vanishes and Hk(X;F)Hk(X;Z)FH_k(X; F) \cong H_k(X; \mathbb{Z}) \otimes F. This justifies working with field coefficients in computational TDA.

ThMayer-Vietoris Sequence

For a topological space XX expressed as the union of two open sets UU and VV, there is a long exact sequence relating the homology of UU, VV, UVU \cap V, and XX:

Hn(UV)Hn(U)Hn(V)Hn(X)Hn1(UV)\cdots \to H_n(U \cap V) \to H_n(U) \oplus H_n(V) \to H_n(X) \to H_{n-1}(U \cap V) \to \cdots

This sequence is a powerful computational tool: it reduces the computation of homology of XX to computing homology of simpler subspaces.


Worked Examples

📝Betti Numbers of Common Shapes

Problem: Compute the Betti numbers for (a) a circle, (b) a sphere, (c) a torus, (d) a figure-eight.

Solution:

(a) Circle S1S^1:

  • β0=1\beta_0 = 1 (one connected component)
  • β1=1\beta_1 = 1 (one loop)
  • βk=0\beta_k = 0 for k2k \geq 2

Euler characteristic: χ=11=0\chi = 1 - 1 = 0

(b) Sphere S2S^2:

  • β0=1\beta_0 = 1 (connected)
  • β1=0\beta_1 = 0 (no loops—any loop can be contracted to a point)
  • β2=1\beta_2 = 1 (one void/enclosed cavity)
  • βk=0\beta_k = 0 for k3k \geq 3

Euler characteristic: χ=10+1=2\chi = 1 - 0 + 1 = 2

(c) Torus T2=S1×S1T^2 = S^1 \times S^1:

  • β0=1\beta_0 = 1 (connected)
  • β1=2\beta_1 = 2 (two independent loops: meridian and longitude)
  • β2=1\beta_2 = 1 (one void)
  • βk=0\beta_k = 0 for k3k \geq 3

Euler characteristic: χ=12+1=0\chi = 1 - 2 + 1 = 0

(d) Figure-eight (wedge of two circles):

  • β0=1\beta_0 = 1 (connected)
  • β1=2\beta_1 = 2 (two loops)
  • βk=0\beta_k = 0 for k2k \geq 2

Euler characteristic: χ=12=1\chi = 1 - 2 = -1

Result: The Betti numbers distinguish these shapes: the torus has β1=2\beta_1 = 2 while the sphere has β1=0\beta_1 = 0. The figure-eight and torus share β1=2\beta_1 = 2 but differ in β2\beta_2.

📝Vietoris-Rips Complex of a Point Cloud

Problem: Given points A=(0,0)A = (0,0), B=(1,0)B = (1,0), C=(0.5,0.866)C = (0.5, 0.866) (vertices of an equilateral triangle with side length 1), construct the Vietoris-Rips complex for ϵ=0.5\epsilon = 0.5, ϵ=1.0\epsilon = 1.0, and ϵ=1.5\epsilon = 1.5.

Solution:

Step 1: Compute pairwise distances. d(A,B)=1d(A,B) = 1, d(A,C)=1d(A,C) = 1, d(B,C)=1d(B,C) = 1 (equilateral triangle).

Step 2: ϵ=0.5\epsilon = 0.5: No pair of points has distance 0.5\leq 0.5.

  • 0-simplices: {A},{B},{C}\{A\}, \{B\}, \{C\} (3 vertices)
  • 1-simplices: none
  • β0=3\beta_0 = 3 (three components), β1=0\beta_1 = 0

Step 3: ϵ=1.0\epsilon = 1.0: All pairwise distances 1\leq 1.

  • 0-simplices: {A},{B},{C}\{A\}, \{B\}, \{C\}
  • 1-simplices: {A,B},{A,C},{B,C}\{A,B\}, \{A,C\}, \{B,C\} (all edges)
  • 2-simplices: {A,B,C}\{A,B,C\} (since all edges present, triangle filled)
  • β0=1\beta_0 = 1 (connected), β1=0\beta_1 = 0 (loop filled by triangle)

Step 4: ϵ=1.5\epsilon = 1.5: Same as ϵ=1.0\epsilon = 1.0 (no new features).

  • Same complex as ϵ=1.0\epsilon = 1.0

Result: The persistence diagram has one connected component born at ϵ=0\epsilon = 0 (never dies), and no other features. The triangle forms at ϵ=1.0\epsilon = 1.0.

📝Persistent Homology of a Noisy Circle

Problem: 20 points sampled from a unit circle with small Gaussian noise. Describe the expected persistence diagram.

Solution:

Step 1: As ϵ\epsilon increases from 0:

  • At ϵ0\epsilon \approx 0: Each point is a separate component (β0=20\beta_0 = 20).
  • As ϵ\epsilon grows: Nearby points connect, reducing β0\beta_0.

Step 2: At ϵ0.3\epsilon \approx 0.3 (half the typical inter-point spacing):

  • Most points connect into a single ring.
  • β0=1\beta_0 = 1 (one component).
  • β1=1\beta_1 = 1 (one loop appears—the circle).

Step 3: The loop persists until ϵ1.0\epsilon \approx 1.0 (diameter of circle), when the interior fills in and the loop dies.

Step 4: Expected persistence diagram:

  • β0\beta_0: 19 "noise" components born at ϵ=0\epsilon = 0, dying at various small ϵ\epsilon values (short bars). 1 "signal" component born at ϵ=0\epsilon = 0, never dying (infinite bar).
  • β1\beta_1: 1 "signal" loop born at ϵ0.3\epsilon \approx 0.3, dying at ϵ1.0\epsilon \approx 1.0 (long bar). Possible small "noise" loops born and dying at similar scales (short bars).

Result: The persistent homology clearly identifies the circle: one long-lived β1\beta_1 feature with persistence 0.7\approx 0.7, while noise features have short persistence. The bottleneck distance to the exact circle's diagram is small.

📝Euler Characteristic of a Graph

Problem: Compute the Euler characteristic of a complete graph K5K_5 (5 vertices, all edges) and a tree with 5 vertices.

Solution:

Complete graph K5K_5:

  • α0=5\alpha_0 = 5 (vertices), α1=(52)=10\alpha_1 = \binom{5}{2} = 10 (edges), αk=0\alpha_k = 0 for k2k \geq 2
  • χ=α0α1=510=5\chi = \alpha_0 - \alpha_1 = 5 - 10 = -5

Alternatively: β0=1\beta_0 = 1, β1=α1α0+1=105+1=6\beta_1 = \alpha_1 - \alpha_0 + 1 = 10 - 5 + 1 = 6 (independent cycles). χ=β0β1=16=5\chi = \beta_0 - \beta_1 = 1 - 6 = -5

Tree with 5 vertices:

  • α0=5\alpha_0 = 5 (vertices), α1=4\alpha_1 = 4 (edges, since a tree on nn vertices has n1n-1 edges)
  • χ=54=1\chi = 5 - 4 = 1

Alternatively: β0=1\beta_0 = 1 (connected), β1=0\beta_1 = 0 (no cycles in a tree). χ=10=1\chi = 1 - 0 = 1

Result: K5K_5 has χ=5\chi = -5 (many cycles), while the tree has χ=1\chi = 1 (no cycles). The Euler characteristic distinguishes these graphs: trees always have χ=1\chi = 1, while graphs with cycles have χ0\chi \leq 0.


Practice Problems

📝Problem 1: Connected Components of a Subset of R²

Problem: Determine the connected components and compactness of A={(x,y):x2+y21}{(x,y):(x3)2+y21}A = \{(x,y) : x^2 + y^2 \leq 1\} \cup \{(x,y) : (x-3)^2 + y^2 \leq 1\}.

Solution:

Step 1: AA is the union of two closed disks: D1D_1 centered at (0,0)(0,0) with radius 1, and D2D_2 centered at (3,0)(3,0) with radius 1.

Step 2: The distance between centers is 3, and the sum of radii is 2. Since 3>23 > 2, the disks do not overlap (their intersection is empty).

Step 3: Connected components: Each disk is connected (convex sets are connected), and since they are disjoint, AA has exactly 2 connected components.

Step 4: Compactness: Each disk is closed and bounded in R2\mathbb{R}^2, hence compact by Heine-Borel. The finite union of compact sets is compact, so AA is compact.

Result: β0(A)=2\beta_0(A) = 2 (two connected components), AA is compact.

📝Problem 2: Homotopy Equivalence

Problem: Show that a circle S1S^1 is homotopy equivalent to R2{(0,0)}\mathbb{R}^2 \setminus \{(0,0)\} (the punctured plane).

Solution:

Step 1: Define f:S1R2{(0,0)}f: S^1 \to \mathbb{R}^2 \setminus \{(0,0)\} as the inclusion f(z)=zf(z) = z (view S1R2S^1 \subset \mathbb{R}^2).

Step 2: Define g:R2{(0,0)}S1g: \mathbb{R}^2 \setminus \{(0,0)\} \to S^1 as the radial projection g(x)=x/xg(x) = x/\|x\|.

Step 3: Check gf=idS1g \circ f = \text{id}_{S^1}: For zS1z \in S^1, g(f(z))=g(z)=z/z=zg(f(z)) = g(z) = z/\|z\| = z (since z=1\|z\| = 1). ✓

Step 4: Check fgidf \circ g \simeq \text{id}: Define H(x,t)=(1t)xx+txH(x, t) = (1-t) \cdot \frac{x}{\|x\|} + t \cdot x.

For t[0,1]t \in [0,1], H(x,t)=(1t)xx+txH(x, t) = (1-t)\frac{x}{\|x\|} + tx. This is continuous and never zero (since (1t)/x+t>0(1-t)/\|x\| + t > 0 for x0x \neq 0).

H(x,0)=x/x=g(x)H(x, 0) = x/\|x\| = g(x), H(x,1)=xH(x, 1) = x.

Result: ff and gg are homotopy inverses, so S1R2{(0,0)}S^1 \simeq \mathbb{R}^2 \setminus \{(0,0)\}. Both have β0=1\beta_0 = 1, β1=1\beta_1 = 1.

📝Problem 3: Compactness in Function Spaces

Problem: Is the closed unit ball B={fC[0,1]:f1}B = \{f \in C[0,1] : \|f\|_\infty \leq 1\} compact in the supremum norm?

Solution:

Step 1: Consider the sequence fn(x)=xnf_n(x) = x^n in BB. Each fnf_n is continuous with fn=1\|f_n\|_\infty = 1.

Step 2: Pointwise limit: f(x)=limnxn={0x[0,1)1x=1f(x) = \lim_{n \to \infty} x^n = \begin{cases} 0 & x \in [0,1) \\ 1 & x = 1 \end{cases}

This limit is NOT continuous, so fC[0,1]f \notin C[0,1].

Step 3: Any subsequence of {fn}\{f_n\} converges pointwise to the same discontinuous function, so no subsequence converges in the supremum norm.

Step 4: Therefore {fn}\{f_n\} has no convergent subsequence in BB.

Result: The closed unit ball in C[0,1]C[0,1] is NOT compact. This is a consequence of the Riesz theorem: in an infinite-dimensional Banach space, the closed unit ball is never compact. Contrast with finite-dimensional Rn\mathbb{R}^n where the closed unit ball IS compact (Heine-Borel).

📝Problem 4: Persistent Homology of Two Concentric Circles

Problem: Describe the persistence diagram for 30 points sampled from two concentric circles of radii 1 and 2.

Solution:

Step 1: At small ϵ\epsilon: 30 connected components (β0=30\beta_0 = 30).

Step 2: As ϵ\epsilon increases, points on each circle connect. The inner circle (radius 1) forms a loop first (at ϵ0.2\epsilon \approx 0.2, the typical spacing). The outer circle (radius 2) forms a loop at a slightly larger ϵ\epsilon.

Step 3: At ϵ0.5\epsilon \approx 0.5: Both circles are fully connected.

  • β0=2\beta_0 = 2 (two connected components: inner and outer rings)
  • β1=2\beta_1 = 2 (two loops)

Step 4: As ϵ\epsilon grows toward 1.0: The inner circle's loop fills in (dies). The outer circle's loop persists longer.

Step 5: At ϵ2.0\epsilon \approx 2.0: The outer loop fills in.

Persistence diagram:

  • β0\beta_0: 1 infinite bar (everything connected eventually), 28 short bars (noise components)
  • β1\beta_1: 2 long bars. Inner loop: born 0.2\approx 0.2, dies 1.0\approx 1.0 (persistence 0.8\approx 0.8). Outer loop: born 0.4\approx 0.4, dies 2.0\approx 2.0 (persistence 1.6\approx 1.6).

Result: The two loops are clearly distinguished by their persistence. The outer loop has higher persistence because it is larger. TDA correctly identifies the "shape" of two concentric circles.

📝Problem 5: Čech Complex vs. Vietoris-Rips Complex

Problem: For a set of points forming a regular hexagon, compare the Čech and Vietoris-Rips complexes at different scales.

Solution:

Step 1: Consider a regular hexagon with side length 1. Vertices: v0,,v5v_0, \ldots, v_5.

Step 2: For ϵ<1/2\epsilon < 1/2: No edges in either complex.

Step 3: For ϵ=1/2\epsilon = 1/2:

  • Čech complex: Two adjacent balls intersect iff the distance between centers 1\leq 1 (diameter of intersection region). Adjacent vertices are distance 1 apart, so edges appear.
  • Vietoris-Rips: Same edges (distance 1 2ϵ=1\leq 2\epsilon = 1).
  • Both give the cycle graph C6C_6.

Step 4: For ϵ=1/30.577\epsilon = 1/\sqrt{3} \approx 0.577:

  • Čech complex: Three consecutive vertices v0,v1,v2v_0, v_1, v_2 have balls intersecting at the center of the triangle they form. A 2-simplex {v0,v1,v2}\{v_0, v_1, v_2\} appears.
  • Vietoris-Rips: All pairwise distances 2ϵ1.15\leq 2\epsilon \approx 1.15. Since the hexagon's diameter is 2, only adjacent and next-adjacent pairs (distance 31.73\leq \sqrt{3} \approx 1.73) are connected.

Step 5: Key difference: The Čech complex requires a common intersection point for all balls in a simplex, while Vietoris-Rips only requires pairwise distances. The Čech complex is "smaller" (fewer simplices) but captures the true topology by the Nerve Theorem.

Result: For the same data, the Čech complex is a subcomplex of the Vietoris-Rips complex. Čech is more topologically accurate but computationally harder (requires checking ball intersections). Vietoris-Rips depends only on pairwise distances, making it computationally efficient.

📝Problem 6: Persistent Homology of a Torus

Problem: Describe the expected persistence diagram for points sampled from a torus embedded in R3\mathbb{R}^3.

Solution:

Step 1: A torus has β0=1\beta_0 = 1, β1=2\beta_1 = 2, β2=1\beta_2 = 1.

Step 2: At small scales, the point cloud looks like scattered points (β0=n\beta_0 = n).

Step 3: As ϵ\epsilon increases, points connect into a "skeleton" of the torus:

  • β0\beta_0 drops to 1 (one connected component)
  • Two loops appear (β1=2\beta_1 = 2): the meridian and longitude circles

Step 4: The two loops persist over a wide range of ϵ\epsilon values. Eventually:

  • The meridian loop fills in (dies first, at ϵr\epsilon \approx r where rr is the tube radius)
  • The longitude loop fills in later (at ϵR\epsilon \approx R where RR is the major radius)
  • The void (β2=1\beta_2 = 1) appears when the surface is fully reconstructed and dies when the interior fills

Step 5: Expected persistence diagram:

  • β0\beta_0: Many short bars (noise), one infinite bar
  • β1\beta_1: Two long bars (the two independent loops of the torus)
  • β2\beta_2: One bar (the enclosed void)

Result: Persistent homology correctly identifies the torus topology: β0=1\beta_0 = 1, β1=2\beta_1 = 2, β2=1\beta_2 = 1. The two β1\beta_1 features have different persistence values reflecting the different radii of the torus.

📝Problem 7: Vietoris-Rips for Clustering

Problem: How does persistent homology help determine the number of clusters in a point cloud?

Solution:

Step 1: Consider 3 clusters of points, each cluster forming a tight ball, with large gaps between clusters.

Step 2: At small ϵ\epsilon: Each point is separate (β0=n\beta_0 = n).

Step 3: As ϵ\epsilon increases: Points within each cluster connect first. At ϵ\epsilon equal to the intra-cluster spacing, β0=3\beta_0 = 3 (three components).

Step 4: The three components persist until ϵ\epsilon reaches the inter-cluster distance, at which point they merge into one component (β0=1\beta_0 = 1).

Step 5: The persistence diagram for β0\beta_0 shows:

  • Many short bars (noise within clusters)
  • Three medium bars (the three clusters), dying at the inter-cluster distance
  • One infinite bar (everything eventually connected)

Result: The number of long-lived β0\beta_0 features (after filtering short noise bars) indicates the number of clusters. This is the basis of topological clustering methods that use persistent homology to determine cluster count automatically.

📝Problem 8: Čech Complex Computation

Problem: For 4 points forming a square with side length 1, determine at what value of ϵ\epsilon the Čech complex first contains a 2-simplex (triangle).

Solution:

Step 1: Points: A=(0,0)A = (0,0), B=(1,0)B = (1,0), C=(1,1)C = (1,1), D=(0,1)D = (0,1).

Step 2: For a 2-simplex {A,B,C}\{A, B, C\}, we need balls of radius ϵ\epsilon around AA, BB, CC to have a common intersection point.

Step 3: The circumcenter of triangle ABCABC is at (0.5,0.5)(0.5, 0.5) with circumradius r=0.52+0.52=0.5=1/20.707r = \sqrt{0.5^2 + 0.5^2} = \sqrt{0.5} = 1/\sqrt{2} \approx 0.707.

Step 4: The three balls intersect when ϵr=1/2\epsilon \geq r = 1/\sqrt{2}.

Step 5: At ϵ=1/2\epsilon = 1/\sqrt{2}, the point (0.5,0.5)(0.5, 0.5) is equidistant from AA, BB, CC at distance 1/21/\sqrt{2}, so it lies in all three balls.

Result: The Čech complex first contains a 2-simplex at ϵ=1/20.707\epsilon = 1/\sqrt{2} \approx 0.707. This is less than the Vietoris-Rips threshold (ϵ=1\epsilon = 1 for the same simplex), illustrating that Čech is "tighter" than Rips.

📝Problem 9: Persistence Landscape Computation

Problem: Given a persistence diagram with features (0.2,0.8)(0.2, 0.8) and (0.3,0.6)(0.3, 0.6) for β1\beta_1, compute the first persistence landscape.

Solution:

Step 1: For each feature (bi,di)(b_i, d_i), define the tent function:

fi(t)={tbiif bitbi+di2ditif bi+di2tdi0otherwisef_i(t) = \begin{cases} t - b_i & \text{if } b_i \leq t \leq \frac{b_i+d_i}{2} \\ d_i - t & \text{if } \frac{b_i+d_i}{2} \leq t \leq d_i \\ 0 & \text{otherwise} \end{cases}

Step 2: For feature 1: (0.2,0.8)(0.2, 0.8), midpoint =0.5= 0.5:

f1(t)={t0.20.2t0.50.8t0.5t0.80otherwisef_1(t) = \begin{cases} t - 0.2 & 0.2 \leq t \leq 0.5 \\ 0.8 - t & 0.5 \leq t \leq 0.8 \\ 0 & \text{otherwise} \end{cases}

Peak value: 0.30.3 at t=0.5t = 0.5.

Step 3: For feature 2: (0.3,0.6)(0.3, 0.6), midpoint =0.45= 0.45:

f2(t)={t0.30.3t0.450.6t0.45t0.60otherwisef_2(t) = \begin{cases} t - 0.3 & 0.3 \leq t \leq 0.45 \\ 0.6 - t & 0.45 \leq t \leq 0.6 \\ 0 & \text{otherwise} \end{cases}

Peak value: 0.150.15 at t=0.45t = 0.45.

Step 4: The first persistence landscape is λ1(t)=maxifi(t)\lambda_1(t) = \max_i f_i(t):

  • For t[0.2,0.3]t \in [0.2, 0.3]: λ1(t)=t0.2\lambda_1(t) = t - 0.2 (only feature 1 active)
  • For t[0.3,0.45]t \in [0.3, 0.45]: λ1(t)=max(t0.2,t0.3)=t0.2\lambda_1(t) = \max(t-0.2, t-0.3) = t-0.2 (feature 1 dominates)
  • For t[0.45,0.5]t \in [0.45, 0.5]: λ1(t)=max(t0.2,0.6t)=t0.2\lambda_1(t) = \max(t-0.2, 0.6-t) = t-0.2 (feature 1 still dominates)
  • For t[0.5,0.6]t \in [0.5, 0.6]: λ1(t)=max(0.8t,0.6t)=0.8t\lambda_1(t) = \max(0.8-t, 0.6-t) = 0.8-t (feature 1 dominates)
  • For t[0.6,0.8]t \in [0.6, 0.8]: λ1(t)=0.8t\lambda_1(t) = 0.8-t (only feature 1 active)

Result: The first persistence landscape λ1(t)\lambda_1(t) is the upper envelope of the tent functions. It can be used as a feature vector for statistical analysis (averaging landscapes across samples, computing confidence bands, etc.).

📝Problem 10: TDA for Time Series Classification

Problem: How would you use persistent homology to classify periodic vs. non-periodic time series?

Solution:

Step 1: Given a time series x(t)x(t), create a delay embedding in Rd\mathbb{R}^d:

y(t)=(x(t),x(t+τ),x(t+2τ),,x(t+(d1)τ))\mathbf{y}(t) = (x(t), x(t+\tau), x(t+2\tau), \ldots, x(t+(d-1)\tau))

where τ\tau is the delay and dd is the embedding dimension.

Step 2: For a periodic signal, the delay embedding traces a closed curve (a loop in Rd\mathbb{R}^d).

Step 3: Compute persistent homology of the point cloud {y(t)}\{\mathbf{y}(t)\}.

Step 4: Classification criteria:

  • Periodic signal: β1=1\beta_1 = 1 with high persistence (long-lived loop)
  • Non-periodic signal: β1=0\beta_1 = 0 or β1=1\beta_1 = 1 with low persistence (short-lived feature)

Step 5: Use the persistence of the dominant β1\beta_1 feature as a classification feature: threshold at some value to separate periodic from non-periodic.

Result: TDA provides a principled, noise-robust method for detecting periodicity. The delay embedding + persistent homology approach works even for noisy, irregularly sampled time series, making it valuable for applications in astronomy (variable star classification), medicine (heartbeat analysis), and finance (cycle detection).


Common Mistakes

MistakeCorrect Approach
Assuming more data always gives better topologyNoise creates short-lived topological features; focus on persistence
Confusing homotopy equivalence with homeomorphismHomotopy equivalence is weaker; a disk and a point are homotopy equivalent but not homeomorphic
Using Vietoris-Rips without understanding its relation to ČechRips is a superset of Čech; Rips may create false features that Čech avoids
Ignoring the choice of distance metricTDA results depend on the metric; different metrics yield different persistence diagrams
Assuming Betti numbers capture all topologyBetti numbers are invariants, but they don't capture the full homotopy type (e.g., lens spaces)
Confusing persistent homology with clusteringClustering gives partition; persistent homology captures higher-dimensional topology (loops, voids)
Forgetting that persistence is invariant under rotation/translationTDA captures intrinsic geometry, not extrinsic position

Connections to Machine Learning

ℹ️ Connections to Machine Learning

TDA provides topology-aware features for ML: (1) Persistence diagrams serve as features for classification—long-lived features indicate meaningful structure. (2) Persistence landscapes convert diagrams into functions suitable for statistical analysis. (3) Topological regularization encourages neural networks to learn functions with simple topology. (4) Shape analysis uses persistent homology to compare 3D meshes and point clouds. (5) Time series analysis detects periodicity and anomalies via persistent homology of delay embeddings. (6) Graph neural networks can incorporate topological features from persistent homology of graph neighborhoods. (7) Manifold learning methods (Isomap, UMAP) assume data lies on a low-dimensional manifold; TDA can validate this assumption by measuring topological complexity.


Exam/Interview Questions

Q1: What are Betti numbers, and what do they measure?

Answer: Betti numbers βk\beta_k count the number of kk-dimensional "holes" in a topological space: β0\beta_0 = connected components, β1\beta_1 = independent loops, β2\beta_2 = enclosed voids. They are topological invariants—preserved under homeomorphism. For a circle: β0=1\beta_0 = 1, β1=1\beta_1 = 1. For a sphere: β0=1\beta_0 = 1, β2=1\beta_2 = 1. For a torus: β0=1\beta_0 = 1, β1=2\beta_1 = 2, β2=1\beta_2 = 1.


Q2: Explain the difference between the Vietoris-Rips and Čech complexes.

Answer: The Vietoris-Rips complex VR(X,ϵ)\text{VR}(X, \epsilon) includes a simplex if all pairwise distances are ϵ\leq \epsilon. The Čech complex includes a simplex if the balls of radius ϵ\epsilon around the vertices have a common intersection point. Čech is topologically more accurate (by the Nerve Theorem) but harder to compute. Rips is always a superset of Čech and depends only on pairwise distances, making it computationally efficient.


Q3: Why is persistent homology "stable"?

Answer: The stability theorem states that dB(pers(f),pers(g))fgd_B(\text{pers}(f), \text{pers}(g)) \leq \|f - g\|_\infty, where dBd_B is the bottleneck distance. This means if we perturb the data (or the function defining the filtration) by a small amount in the supremum norm, the persistence diagram changes by at most that amount. This guarantees that TDA is robust to noise—small perturbations cause small changes in the topological summary.


Q4: What is the Euler characteristic and how does it relate to Betti numbers?

Answer: The Euler characteristic is χ=k=0n(1)kβk\chi = \sum_{k=0}^n (-1)^k \beta_k. For surfaces: sphere (χ=2\chi = 2), torus (χ=0\chi = 0), genus-gg surface (χ=22g\chi = 2 - 2g). It can also be computed from any triangulation: χ=α0α1+α2\chi = \alpha_0 - \alpha_1 + \alpha_2 - \cdots where αk\alpha_k is the number of kk-simplices. It is a topological invariant (independent of triangulation) and relates geometry to topology via the Gauss-Bonnet theorem.


Q5: How can persistent homology be used as features for machine learning?

Answer: Persistence diagrams can be converted to feature vectors via: (1) Persistence landscapes—transform diagrams into functions amenable to averaging and statistics. (2) Persistence images—rasterize diagrams into fixed-size images. (3) Betti curves—plot βk(ϵ)\beta_k(\epsilon) as a function of scale. (4) Persistence statistics—compute mean, variance, and other statistics of birth-death pairs. These features capture topological structure (holes, loops, voids) that traditional features miss, and have been used in shape classification, time series analysis, and molecular biology.


Q6: Why is compactness important in topology and analysis?

Answer: Compactness ensures that: (1) Every sequence has a convergent subsequence (sequential compactness), guaranteeing existence of solutions. (2) Continuous functions on compact sets are bounded and attain their extrema (extreme value theorem). (3) Open covers have finite subcovers, enabling finite-dimensional arguments. In TDA, compactness of the underlying space ensures persistence diagrams are well-defined (features have finite birth and death times). In analysis, compactness justifies convergence of iterative algorithms and existence of optimal solutions.


Quick Reference

ConceptDefinitionKey Property
Topological Space(X,τ)(X, \tau) with open set axiomsGeneralizes metric spaces
Open SetUτU \in \tauClosed under arbitrary union
Closed SetXUX \setminus U openContains all limit points
ConnectedCannot split into disjoint open setsβ0\beta_0 counts components
CompactEvery open cover has finite subcoverSequentially compact in metric spaces
HomeomorphismContinuous bijection with continuous inversePreserves topology
HomotopyContinuous deformation between mapsPreserves "shape"
Simplicial ComplexCollection of simplices closed under subsetsDiscrete model of topology
Betti Numberβk=rank(Hk)\beta_k = \text{rank}(H_k)Counts kk-dimensional holes
Euler Characteristicχ=(1)kβk\chi = \sum (-1)^k \beta_kTopological invariant
Persistent HomologyTrack feature birth/death across scalesStable under perturbation
Vietoris-Rips{xi}\{x_i\} simplex iff d(xi,xj)ϵi,jd(x_i, x_j) \leq \epsilon \, \forall i,jPairwise distance condition
Čech Complex{xi}\{x_i\} simplex iff balls have common intersectionNerve Theorem applies
Bottleneck DistancedB(D1,D2)=infγsuppγ(p)d_B(D_1, D_2) = \inf_\gamma \sup \|p - \gamma(p)\|_\inftyStability metric

Cross-References

  • 096-advanced-tensor-calculus — Tensor products in algebraic topology; chain complexes as tensor products
  • 097-advanced-differential-geometry — Manifolds are topological spaces with smooth structure; de Rham cohomology relates to simplicial homology
  • 098-advanced-functional-analysis — Function spaces are topological vector spaces; weak topologies connect analysis to topology
  • 099-advanced-measure-theory — Borel σ-algebras are generated by open sets; measurability relates to topological structure
  • Graph Theory: Graphs are 1-dimensional simplicial complexes; graph connectivity relates to β0\beta_0
Lesson Progress100 / 100