Topological Data Analysis

ℹ️ Why It Matters

Topological data analysis (TDA) captures the shape of data—connected components, loops, voids, and higher-dimensional cavities—that traditional statistical methods miss. Persistent homology tracks these topological features across multiple scales, providing robust descriptors that are invariant to continuous deformations. TDA has proven valuable in shape analysis, time series classification, protein structure analysis, and as feature engineering for machine learning. The mathematical foundation draws from algebraic topology, simplicial homology, and metric geometry.

Core Definitions

DfTopological Space

A topological space is a pair $(X, \tau)$ where $X$ is a set and $\tau \subseteq 2^X$ is a collection of subsets (open sets) satisfying: (1) $\emptyset, X \in \tau$ ; (2) arbitrary unions of sets in $\tau$ are in $\tau$ ; (3) finite intersections of sets in $\tau$ are in $\tau$ . A topology captures the notion of "closeness" without requiring a metric. Examples include the standard topology on $\mathbb{R}$ (generated by open intervals) and the discrete topology (every subset is open).

DfOpen and Closed Sets

A set $U \subseteq X$ is open if $U \in \tau$ . A set $C \subseteq X$ is closed if $X \setminus C$ is open (equivalently, $C$ contains all its limit points). In a metric space, open balls $B(x, r) = \{y : d(x,y) < r\}$ generate the topology. A set can be both open and closed (clopen), or neither. In connected spaces, the only clopen sets are $\emptyset$ and $X$ .

DfHomeomorphism

A function $f: X \to Y$ between topological spaces is a homeomorphism if it is bijective, continuous, and has a continuous inverse. Homeomorphic spaces are topologically equivalent—they have the same "shape." For example, a coffee mug and a donut are homeomorphic (both have one hole). Homeomorphism preserves connectedness, compactness, and all topological invariants.

DfConnectedness

A topological space $X$ is connected if it cannot be written as the union of two disjoint non-empty open sets. Equivalently, the only clopen (simultaneously closed and open) sets are $\emptyset$ and $X$ . A space is path-connected if any two points can be joined by a continuous path. Path-connected implies connected, but not conversely (the topologist's sine curve is connected but not path-connected). The number of connected components $\beta_0$ is the zeroth Betti number.

DfCompactness

A topological space $X$ is compact if every open cover $\{U_\alpha\}$ (where $X \subseteq \bigcup U_\alpha$ ) has a finite subcover. In metric spaces, compactness is equivalent to sequential compactness (every sequence has a convergent subsequence) and to being complete and totally bounded. Compact sets are "finite" in a topological sense. The Heine-Borel theorem states that in $\mathbb{R}^n$ , a set is compact iff it is closed and bounded.

DfHomotopy

Two continuous maps $f, g: X \to Y$ are homotopic (written $f \simeq g$ ) if there exists a continuous map $H: X \times [0,1] \to Y$ such that $H(x, 0) = f(x)$ and $H(x, 1) = g(x)$ . A space is contractible if it is homotopy equivalent to a point. Homotopy equivalence is weaker than homeomorphism—it preserves "shape" but not rigid structure. For example, a disk is contractible (homotopy equivalent to a point), while a circle is not.

DfSimplicial Complex

An abstract simplicial complex $K$ on a vertex set $V$ is a collection of finite subsets of $V$ (simplices) closed under taking subsets. A simplex $\{v_0, \ldots, v_k\}$ has dimension $k$ . The geometric realization embeds $K$ in Euclidean space as a union of simplices. Examples: graph (1-complex), triangle mesh (2-complex), tetrahedralization (3-complex). The $f$ -vector $(f_0, f_1, \ldots, f_d)$ counts the number of simplices in each dimension.

DfSimplicial Homology

The $k$ -th chain group $C_k(K)$ is the free abelian group generated by $k$ -simplices. The boundary operator $\partial_k: C_k \to C_{k-1}$ maps each $k$ -simplex to its oriented boundary. The $k$ -th homology group is $H_k(K) = \ker \partial_k / \text{im } \partial_{k+1}$ , and the $k$ -th Betti number is $\beta_k = \text{rank}(H_k)$ . Homology measures "holes" in each dimension.

DfPersistent Homology

Given a filtration of simplicial complexes $\emptyset = K_0 \subseteq K_1 \subseteq \cdots \subseteq K_m = K$ , persistent homology tracks the birth and death of homological features (connected components, loops, voids) as the filtration parameter increases. A feature born at scale $b$ and dying at scale $d$ has persistence $d - b$ ; features with high persistence are considered "real" structure, while short-lived features are attributed to noise.

DfVietoris-Rips Complex

For a point cloud $X$ and parameter $\epsilon > 0$ , the Vietoris-Rips complex $\text{VR}(X, \epsilon)$ includes a simplex $\{x_{i_0}, \ldots, x_{i_k}\}$ if $d(x_{i_j}, x_{i_l}) \leq \epsilon$ for all $j, l$ . As $\epsilon$ increases, more simplices are added, creating a filtration. The Rips complex is commonly used in TDA because it depends only on pairwise distances.

DfČech Complex

For a point cloud $X$ and parameter $\epsilon > 0$ , the Čech complex includes a simplex $\{x_{i_0}, \ldots, x_{i_k}\}$ if the balls of radius $\epsilon$ centered at $x_{i_0}, \ldots, x_{i_k}$ have non-empty intersection. The Čech complex captures the "union of balls" topology and satisfies the Nerve Theorem: if the balls are convex, the Čech complex is homotopy equivalent to the union of balls.

Key Formulas

Betti Numbers

\beta_k = \text{rank}(H_k) = \text{dim}(\ker \partial_k) - \text{dim}(\text{im } \partial_{k+1})

Here,

$\beta_k$ =k-th Betti number (number of k-dimensional holes)
$H_k$ =k-th homology group
$\partial_k$ =Boundary operator from k-chains to (k-1)-chains

Euler Characteristic

\chi = \sum_{k=0}^{n} (-1)^k \beta_k = \sum_{k=0}^{n} (-1)^k \alpha_k

Here,

$\chi$ =Euler characteristic (topological invariant)
$\beta_k$ =k-th Betti number
$\alpha_k$ =Number of k-simplices in a triangulation

Persistence Diagram

Dgm_k = \{(b_i, d_i) : \text{feature born at } b_i \text{ dies at } d_i\}

Here,

$Dgm_k$ =Persistence diagram for k-dimensional features
$b_i$ =Birth time of the i-th feature
$d_i$ =Death time of the i-th feature

Bottleneck Distance

d_B(D_1, D_2) = \inf_{\gamma} \sup_{(b,d) \in D_1} \|(b,d) - \gamma(b,d)\|_\infty

Here,

$D_1, D_2$ =Two persistence diagrams
$\gamma$ =Bijection (matching) between the diagrams
$d_B$ =Bottleneck distance measuring diagram similarity

Wasserstein Distance

W_p(D_1, D_2) = \left(\inf_{\gamma} \sum_{(b,d) \in D_1} \|(b,d) - \gamma(b,d)\|_\infty^p\right)^{1/p}

Here,

$W_p$ =p-Wasserstein distance between persistence diagrams
$\gamma$ =Optimal matching (bijection with possible diagonal matches)

Vietoris-Rips Distance Condition

\{x_{i_0}, \ldots, x_{i_k}\} \in \text{VR}(X, \epsilon) \iff d(x_{i_j}, x_{i_l}) \leq \epsilon \; \forall \, j, l

Here,

$\epsilon$ =Distance threshold parameter
$d(x_{i_j}, x_{i_l})$ =Pairwise distance between points

Čech Complex Intersection Condition

\{x_{i_0}, \ldots, x_{i_k}\} \in \check{C}(X, \epsilon) \iff \bigcap_{j=0}^{k} B(x_{i_j}, \epsilon) \neq \emptyset

Here,

$B(x, epsilon)$ =Open ball of radius ε centered at x
$\epsilon$ =Radius parameter

Important Theorems

ThNerve Theorem (Borsuk)

Let $\{U_\alpha\}$ be a finite collection of convex, contractible subsets of $\mathbb{R}^n$ such that all non-empty intersections $U_{\alpha_0} \cap \cdots \cap U_{\alpha_k}$ are also contractible. Then the nerve of the cover is homotopy equivalent to $\bigcup U_\alpha$ . This theorem justifies using simplicial complexes (built from intersections of balls) to capture the topology of the underlying space.

ThStability of Persistence Diagrams

If $f, g: X \to \mathbb{R}$ are continuous functions on a compact space, then $d_B(\text{pers}(f), \text{pers}(g)) \leq \|f - g\|_\infty$ , where $d_B$ is the bottleneck distance and $\text{pers}(f)$ is the persistence diagram of $f$ . This means small perturbations of the data cause small changes in the persistence diagram—TDA is stable.

ThIsometry Invariance of Persistent Homology

Persistent homology is invariant under isometries of the metric space. If $(X, d_X)$ and $(Y, d_Y)$ are isometric metric spaces, their persistence diagrams are identical. This means TDA captures intrinsic geometric structure independent of embedding.

ThUniversal Coefficient Theorem

For a topological space $X$ with simplicial homology $H_k(X; \mathbb{Z})$ , the homology with coefficients in a field $F$ satisfies:

H_k(X; F) \cong H_k(X; \mathbb{Z}) \otimes F \oplus \text{Tor}(H_{k-1}(X; \mathbb{Z}), F)

When $F$ has characteristic 0 (e.g., $\mathbb{Q}$ , $\mathbb{R}$ ), the Tor term vanishes and $H_k(X; F) \cong H_k(X; \mathbb{Z}) \otimes F$ . This justifies working with field coefficients in computational TDA.

ThMayer-Vietoris Sequence

For a topological space $X$ expressed as the union of two open sets $U$ and $V$ , there is a long exact sequence relating the homology of $U$ , $V$ , $U \cap V$ , and $X$ :

\cdots \to H_n(U \cap V) \to H_n(U) \oplus H_n(V) \to H_n(X) \to H_{n-1}(U \cap V) \to \cdots

This sequence is a powerful computational tool: it reduces the computation of homology of $X$ to computing homology of simpler subspaces.

Worked Examples

📝Betti Numbers of Common Shapes

Problem: Compute the Betti numbers for (a) a circle, (b) a sphere, (c) a torus, (d) a figure-eight.

Solution:

(a) Circle $S^1$ :

$\beta_0 = 1$ (one connected component)
$\beta_1 = 1$ (one loop)
$\beta_k = 0$ for $k \geq 2$

Euler characteristic: $\chi = 1 - 1 = 0$

(b) Sphere $S^2$ :

$\beta_0 = 1$ (connected)
$\beta_1 = 0$ (no loops—any loop can be contracted to a point)
$\beta_2 = 1$ (one void/enclosed cavity)
$\beta_k = 0$ for $k \geq 3$

Euler characteristic: $\chi = 1 - 0 + 1 = 2$

(c) Torus $T^2 = S^1 \times S^1$ :

$\beta_0 = 1$ (connected)
$\beta_1 = 2$ (two independent loops: meridian and longitude)
$\beta_2 = 1$ (one void)
$\beta_k = 0$ for $k \geq 3$

Euler characteristic: $\chi = 1 - 2 + 1 = 0$

(d) Figure-eight (wedge of two circles):

$\beta_0 = 1$ (connected)
$\beta_1 = 2$ (two loops)
$\beta_k = 0$ for $k \geq 2$

Euler characteristic: $\chi = 1 - 2 = -1$

Result: The Betti numbers distinguish these shapes: the torus has $\beta_1 = 2$ while the sphere has $\beta_1 = 0$ . The figure-eight and torus share $\beta_1 = 2$ but differ in $\beta_2$ .

📝Vietoris-Rips Complex of a Point Cloud

Problem: Given points $A = (0,0)$ , $B = (1,0)$ , $C = (0.5, 0.866)$ (vertices of an equilateral triangle with side length 1), construct the Vietoris-Rips complex for $\epsilon = 0.5$ , $\epsilon = 1.0$ , and $\epsilon = 1.5$ .

Solution:

Step 1: Compute pairwise distances. $d(A,B) = 1$ , $d(A,C) = 1$ , $d(B,C) = 1$ (equilateral triangle).

Step 2: $\epsilon = 0.5$ : No pair of points has distance $\leq 0.5$ .

0-simplices: $\{A\}, \{B\}, \{C\}$ (3 vertices)
1-simplices: none
$\beta_0 = 3$ (three components), $\beta_1 = 0$

Step 3: $\epsilon = 1.0$ : All pairwise distances $\leq 1$ .

0-simplices: $\{A\}, \{B\}, \{C\}$
1-simplices: $\{A,B\}, \{A,C\}, \{B,C\}$ (all edges)
2-simplices: $\{A,B,C\}$ (since all edges present, triangle filled)
$\beta_0 = 1$ (connected), $\beta_1 = 0$ (loop filled by triangle)

Step 4: $\epsilon = 1.5$ : Same as $\epsilon = 1.0$ (no new features).

Same complex as $\epsilon = 1.0$

Result: The persistence diagram has one connected component born at $\epsilon = 0$ (never dies), and no other features. The triangle forms at $\epsilon = 1.0$ .

📝Persistent Homology of a Noisy Circle

Problem: 20 points sampled from a unit circle with small Gaussian noise. Describe the expected persistence diagram.

Solution:

Step 1: As $\epsilon$ increases from 0:

At $\epsilon \approx 0$ : Each point is a separate component ( $\beta_0 = 20$ ).
As $\epsilon$ grows: Nearby points connect, reducing $\beta_0$ .

Step 2: At $\epsilon \approx 0.3$ (half the typical inter-point spacing):

Most points connect into a single ring.
$\beta_0 = 1$ (one component).
$\beta_1 = 1$ (one loop appears—the circle).

Step 3: The loop persists until $\epsilon \approx 1.0$ (diameter of circle), when the interior fills in and the loop dies.

Step 4: Expected persistence diagram:

$\beta_0$ : 19 "noise" components born at $\epsilon = 0$ , dying at various small $\epsilon$ values (short bars). 1 "signal" component born at $\epsilon = 0$ , never dying (infinite bar).
$\beta_1$ : 1 "signal" loop born at $\epsilon \approx 0.3$ , dying at $\epsilon \approx 1.0$ (long bar). Possible small "noise" loops born and dying at similar scales (short bars).

Result: The persistent homology clearly identifies the circle: one long-lived $\beta_1$ feature with persistence $\approx 0.7$ , while noise features have short persistence. The bottleneck distance to the exact circle's diagram is small.

📝Euler Characteristic of a Graph

Problem: Compute the Euler characteristic of a complete graph $K_5$ (5 vertices, all edges) and a tree with 5 vertices.

Solution:

Complete graph $K_5$ :

$\alpha_0 = 5$ (vertices), $\alpha_1 = \binom{5}{2} = 10$ (edges), $\alpha_k = 0$ for $k \geq 2$
$\chi = \alpha_0 - \alpha_1 = 5 - 10 = -5$

Alternatively: $\beta_0 = 1$ , $\beta_1 = \alpha_1 - \alpha_0 + 1 = 10 - 5 + 1 = 6$ (independent cycles). $\chi = \beta_0 - \beta_1 = 1 - 6 = -5$ ✓

Tree with 5 vertices:

$\alpha_0 = 5$ (vertices), $\alpha_1 = 4$ (edges, since a tree on $n$ vertices has $n-1$ edges)
$\chi = 5 - 4 = 1$

Alternatively: $\beta_0 = 1$ (connected), $\beta_1 = 0$ (no cycles in a tree). $\chi = 1 - 0 = 1$ ✓

Result: $K_5$ has $\chi = -5$ (many cycles), while the tree has $\chi = 1$ (no cycles). The Euler characteristic distinguishes these graphs: trees always have $\chi = 1$ , while graphs with cycles have $\chi \leq 0$ .

Practice Problems

📝Problem 1: Connected Components of a Subset of R²

Problem: Determine the connected components and compactness of $A = \{(x,y) : x^2 + y^2 \leq 1\} \cup \{(x,y) : (x-3)^2 + y^2 \leq 1\}$ .

Solution:

Step 1: $A$ is the union of two closed disks: $D_1$ centered at $(0,0)$ with radius 1, and $D_2$ centered at $(3,0)$ with radius 1.

Step 2: The distance between centers is 3, and the sum of radii is 2. Since $3 > 2$ , the disks do not overlap (their intersection is empty).

Step 3: Connected components: Each disk is connected (convex sets are connected), and since they are disjoint, $A$ has exactly 2 connected components.

Step 4: Compactness: Each disk is closed and bounded in $\mathbb{R}^2$ , hence compact by Heine-Borel. The finite union of compact sets is compact, so $A$ is compact.

Result: $\beta_0(A) = 2$ (two connected components), $A$ is compact.

📝Problem 2: Homotopy Equivalence

Problem: Show that a circle $S^1$ is homotopy equivalent to $\mathbb{R}^2 \setminus \{(0,0)\}$ (the punctured plane).

Solution:

Step 1: Define $f: S^1 \to \mathbb{R}^2 \setminus \{(0,0)\}$ as the inclusion $f(z) = z$ (view $S^1 \subset \mathbb{R}^2$ ).

Step 2: Define $g: \mathbb{R}^2 \setminus \{(0,0)\} \to S^1$ as the radial projection $g(x) = x/\|x\|$ .

Step 3: Check $g \circ f = \text{id}_{S^1}$ : For $z \in S^1$ , $g(f(z)) = g(z) = z/\|z\| = z$ (since $\|z\| = 1$ ). ✓

Step 4: Check $f \circ g \simeq \text{id}$ : Define $H(x, t) = (1-t) \cdot \frac{x}{\|x\|} + t \cdot x$ .

For $t \in [0,1]$ , $H(x, t) = (1-t)\frac{x}{\|x\|} + tx$ . This is continuous and never zero (since $(1-t)/\|x\| + t > 0$ for $x \neq 0$ ).

$H(x, 0) = x/\|x\| = g(x)$ , $H(x, 1) = x$ .

Result: $f$ and $g$ are homotopy inverses, so $S^1 \simeq \mathbb{R}^2 \setminus \{(0,0)\}$ . Both have $\beta_0 = 1$ , $\beta_1 = 1$ .

📝Problem 3: Compactness in Function Spaces

Problem: Is the closed unit ball $B = \{f \in C[0,1] : \|f\|_\infty \leq 1\}$ compact in the supremum norm?

Solution:

Step 1: Consider the sequence $f_n(x) = x^n$ in $B$ . Each $f_n$ is continuous with $\|f_n\|_\infty = 1$ .

Step 2: Pointwise limit: $f(x) = \lim_{n \to \infty} x^n = \begin{cases} 0 & x \in [0,1) \\ 1 & x = 1 \end{cases}$

This limit is NOT continuous, so $f \notin C[0,1]$ .

Step 3: Any subsequence of $\{f_n\}$ converges pointwise to the same discontinuous function, so no subsequence converges in the supremum norm.

Step 4: Therefore $\{f_n\}$ has no convergent subsequence in $B$ .

Result: The closed unit ball in $C[0,1]$ is NOT compact. This is a consequence of the Riesz theorem: in an infinite-dimensional Banach space, the closed unit ball is never compact. Contrast with finite-dimensional $\mathbb{R}^n$ where the closed unit ball IS compact (Heine-Borel).

📝Problem 4: Persistent Homology of Two Concentric Circles

Problem: Describe the persistence diagram for 30 points sampled from two concentric circles of radii 1 and 2.

Solution:

Step 1: At small $\epsilon$ : 30 connected components ( $\beta_0 = 30$ ).

Step 2: As $\epsilon$ increases, points on each circle connect. The inner circle (radius 1) forms a loop first (at $\epsilon \approx 0.2$ , the typical spacing). The outer circle (radius 2) forms a loop at a slightly larger $\epsilon$ .

Step 3: At $\epsilon \approx 0.5$ : Both circles are fully connected.

$\beta_0 = 2$ (two connected components: inner and outer rings)
$\beta_1 = 2$ (two loops)

Step 4: As $\epsilon$ grows toward 1.0: The inner circle's loop fills in (dies). The outer circle's loop persists longer.

Step 5: At $\epsilon \approx 2.0$ : The outer loop fills in.

Persistence diagram:

$\beta_0$ : 1 infinite bar (everything connected eventually), 28 short bars (noise components)
$\beta_1$ : 2 long bars. Inner loop: born $\approx 0.2$ , dies $\approx 1.0$ (persistence $\approx 0.8$ ). Outer loop: born $\approx 0.4$ , dies $\approx 2.0$ (persistence $\approx 1.6$ ).

Result: The two loops are clearly distinguished by their persistence. The outer loop has higher persistence because it is larger. TDA correctly identifies the "shape" of two concentric circles.

📝Problem 5: Čech Complex vs. Vietoris-Rips Complex

Problem: For a set of points forming a regular hexagon, compare the Čech and Vietoris-Rips complexes at different scales.

Solution:

Step 1: Consider a regular hexagon with side length 1. Vertices: $v_0, \ldots, v_5$ .

Step 2: For $\epsilon < 1/2$ : No edges in either complex.

Step 3: For $\epsilon = 1/2$ :

Čech complex: Two adjacent balls intersect iff the distance between centers $\leq 1$ (diameter of intersection region). Adjacent vertices are distance 1 apart, so edges appear.
Vietoris-Rips: Same edges (distance 1 $\leq 2\epsilon = 1$ ).
Both give the cycle graph $C_6$ .

Step 4: For $\epsilon = 1/\sqrt{3} \approx 0.577$ :

Čech complex: Three consecutive vertices $v_0, v_1, v_2$ have balls intersecting at the center of the triangle they form. A 2-simplex $\{v_0, v_1, v_2\}$ appears.
Vietoris-Rips: All pairwise distances $\leq 2\epsilon \approx 1.15$ . Since the hexagon's diameter is 2, only adjacent and next-adjacent pairs (distance $\leq \sqrt{3} \approx 1.73$ ) are connected.

Step 5: Key difference: The Čech complex requires a common intersection point for all balls in a simplex, while Vietoris-Rips only requires pairwise distances. The Čech complex is "smaller" (fewer simplices) but captures the true topology by the Nerve Theorem.

Result: For the same data, the Čech complex is a subcomplex of the Vietoris-Rips complex. Čech is more topologically accurate but computationally harder (requires checking ball intersections). Vietoris-Rips depends only on pairwise distances, making it computationally efficient.

📝Problem 6: Persistent Homology of a Torus

Problem: Describe the expected persistence diagram for points sampled from a torus embedded in $\mathbb{R}^3$ .

Solution:

Step 1: A torus has $\beta_0 = 1$ , $\beta_1 = 2$ , $\beta_2 = 1$ .

Step 2: At small scales, the point cloud looks like scattered points ( $\beta_0 = n$ ).

Step 3: As $\epsilon$ increases, points connect into a "skeleton" of the torus:

$\beta_0$ drops to 1 (one connected component)
Two loops appear ( $\beta_1 = 2$ ): the meridian and longitude circles

Step 4: The two loops persist over a wide range of $\epsilon$ values. Eventually:

The meridian loop fills in (dies first, at $\epsilon \approx r$ where $r$ is the tube radius)
The longitude loop fills in later (at $\epsilon \approx R$ where $R$ is the major radius)
The void ( $\beta_2 = 1$ ) appears when the surface is fully reconstructed and dies when the interior fills

Step 5: Expected persistence diagram:

$\beta_0$ : Many short bars (noise), one infinite bar
$\beta_1$ : Two long bars (the two independent loops of the torus)
$\beta_2$ : One bar (the enclosed void)

Result: Persistent homology correctly identifies the torus topology: $\beta_0 = 1$ , $\beta_1 = 2$ , $\beta_2 = 1$ . The two $\beta_1$ features have different persistence values reflecting the different radii of the torus.

📝Problem 7: Vietoris-Rips for Clustering

Problem: How does persistent homology help determine the number of clusters in a point cloud?

Solution:

Step 1: Consider 3 clusters of points, each cluster forming a tight ball, with large gaps between clusters.

Step 2: At small $\epsilon$ : Each point is separate ( $\beta_0 = n$ ).

Step 3: As $\epsilon$ increases: Points within each cluster connect first. At $\epsilon$ equal to the intra-cluster spacing, $\beta_0 = 3$ (three components).

Step 4: The three components persist until $\epsilon$ reaches the inter-cluster distance, at which point they merge into one component ( $\beta_0 = 1$ ).

Step 5: The persistence diagram for $\beta_0$ shows:

Many short bars (noise within clusters)
Three medium bars (the three clusters), dying at the inter-cluster distance
One infinite bar (everything eventually connected)

Result: The number of long-lived $\beta_0$ features (after filtering short noise bars) indicates the number of clusters. This is the basis of topological clustering methods that use persistent homology to determine cluster count automatically.

📝Problem 8: Čech Complex Computation

Problem: For 4 points forming a square with side length 1, determine at what value of $\epsilon$ the Čech complex first contains a 2-simplex (triangle).

Solution:

Step 1: Points: $A = (0,0)$ , $B = (1,0)$ , $C = (1,1)$ , $D = (0,1)$ .

Step 2: For a 2-simplex $\{A, B, C\}$ , we need balls of radius $\epsilon$ around $A$ , $B$ , $C$ to have a common intersection point.

Step 3: The circumcenter of triangle $ABC$ is at $(0.5, 0.5)$ with circumradius $r = \sqrt{0.5^2 + 0.5^2} = \sqrt{0.5} = 1/\sqrt{2} \approx 0.707$ .

Step 4: The three balls intersect when $\epsilon \geq r = 1/\sqrt{2}$ .

Step 5: At $\epsilon = 1/\sqrt{2}$ , the point $(0.5, 0.5)$ is equidistant from $A$ , $B$ , $C$ at distance $1/\sqrt{2}$ , so it lies in all three balls.

Result: The Čech complex first contains a 2-simplex at $\epsilon = 1/\sqrt{2} \approx 0.707$ . This is less than the Vietoris-Rips threshold ( $\epsilon = 1$ for the same simplex), illustrating that Čech is "tighter" than Rips.

📝Problem 9: Persistence Landscape Computation

Problem: Given a persistence diagram with features $(0.2, 0.8)$ and $(0.3, 0.6)$ for $\beta_1$ , compute the first persistence landscape.

Solution:

Step 1: For each feature $(b_i, d_i)$ , define the tent function:

f_i(t) = \begin{cases} t - b_i & \text{if } b_i \leq t \leq \frac{b_i+d_i}{2} \\ d_i - t & \text{if } \frac{b_i+d_i}{2} \leq t \leq d_i \\ 0 & \text{otherwise} \end{cases}

Step 2: For feature 1: $(0.2, 0.8)$ , midpoint $= 0.5$ :

f_1(t) = \begin{cases} t - 0.2 & 0.2 \leq t \leq 0.5 \\ 0.8 - t & 0.5 \leq t \leq 0.8 \\ 0 & \text{otherwise} \end{cases}

Peak value: $0.3$ at $t = 0.5$ .

Step 3: For feature 2: $(0.3, 0.6)$ , midpoint $= 0.45$ :

f_2(t) = \begin{cases} t - 0.3 & 0.3 \leq t \leq 0.45 \\ 0.6 - t & 0.45 \leq t \leq 0.6 \\ 0 & \text{otherwise} \end{cases}

Peak value: $0.15$ at $t = 0.45$ .

Step 4: The first persistence landscape is $\lambda_1(t) = \max_i f_i(t)$ :

For $t \in [0.2, 0.3]$ : $\lambda_1(t) = t - 0.2$ (only feature 1 active)
For $t \in [0.3, 0.45]$ : $\lambda_1(t) = \max(t-0.2, t-0.3) = t-0.2$ (feature 1 dominates)
For $t \in [0.45, 0.5]$ : $\lambda_1(t) = \max(t-0.2, 0.6-t) = t-0.2$ (feature 1 still dominates)
For $t \in [0.5, 0.6]$ : $\lambda_1(t) = \max(0.8-t, 0.6-t) = 0.8-t$ (feature 1 dominates)
For $t \in [0.6, 0.8]$ : $\lambda_1(t) = 0.8-t$ (only feature 1 active)

Result: The first persistence landscape $\lambda_1(t)$ is the upper envelope of the tent functions. It can be used as a feature vector for statistical analysis (averaging landscapes across samples, computing confidence bands, etc.).

📝Problem 10: TDA for Time Series Classification

Problem: How would you use persistent homology to classify periodic vs. non-periodic time series?

Solution:

Step 1: Given a time series $x(t)$ , create a delay embedding in $\mathbb{R}^d$ :

\mathbf{y}(t) = (x(t), x(t+\tau), x(t+2\tau), \ldots, x(t+(d-1)\tau))

where $\tau$ is the delay and $d$ is the embedding dimension.

Step 2: For a periodic signal, the delay embedding traces a closed curve (a loop in $\mathbb{R}^d$ ).

Step 3: Compute persistent homology of the point cloud $\{\mathbf{y}(t)\}$ .

Step 4: Classification criteria:

Periodic signal: $\beta_1 = 1$ with high persistence (long-lived loop)
Non-periodic signal: $\beta_1 = 0$ or $\beta_1 = 1$ with low persistence (short-lived feature)

Step 5: Use the persistence of the dominant $\beta_1$ feature as a classification feature: threshold at some value to separate periodic from non-periodic.

Result: TDA provides a principled, noise-robust method for detecting periodicity. The delay embedding + persistent homology approach works even for noisy, irregularly sampled time series, making it valuable for applications in astronomy (variable star classification), medicine (heartbeat analysis), and finance (cycle detection).

Common Mistakes

Mistake	Correct Approach
Assuming more data always gives better topology	Noise creates short-lived topological features; focus on persistence
Confusing homotopy equivalence with homeomorphism	Homotopy equivalence is weaker; a disk and a point are homotopy equivalent but not homeomorphic
Using Vietoris-Rips without understanding its relation to Čech	Rips is a superset of Čech; Rips may create false features that Čech avoids
Ignoring the choice of distance metric	TDA results depend on the metric; different metrics yield different persistence diagrams
Assuming Betti numbers capture all topology	Betti numbers are invariants, but they don't capture the full homotopy type (e.g., lens spaces)
Confusing persistent homology with clustering	Clustering gives partition; persistent homology captures higher-dimensional topology (loops, voids)
Forgetting that persistence is invariant under rotation/translation	TDA captures intrinsic geometry, not extrinsic position

Connections to Machine Learning

ℹ️ Connections to Machine Learning

TDA provides topology-aware features for ML: (1) Persistence diagrams serve as features for classification—long-lived features indicate meaningful structure. (2) Persistence landscapes convert diagrams into functions suitable for statistical analysis. (3) Topological regularization encourages neural networks to learn functions with simple topology. (4) Shape analysis uses persistent homology to compare 3D meshes and point clouds. (5) Time series analysis detects periodicity and anomalies via persistent homology of delay embeddings. (6) Graph neural networks can incorporate topological features from persistent homology of graph neighborhoods. (7) Manifold learning methods (Isomap, UMAP) assume data lies on a low-dimensional manifold; TDA can validate this assumption by measuring topological complexity.

Exam/Interview Questions

Q1: What are Betti numbers, and what do they measure?

Answer: Betti numbers $\beta_k$ count the number of $k$ -dimensional "holes" in a topological space: $\beta_0$ = connected components, $\beta_1$ = independent loops, $\beta_2$ = enclosed voids. They are topological invariants—preserved under homeomorphism. For a circle: $\beta_0 = 1$ , $\beta_1 = 1$ . For a sphere: $\beta_0 = 1$ , $\beta_2 = 1$ . For a torus: $\beta_0 = 1$ , $\beta_1 = 2$ , $\beta_2 = 1$ .

Q2: Explain the difference between the Vietoris-Rips and Čech complexes.

Answer: The Vietoris-Rips complex $\text{VR}(X, \epsilon)$ includes a simplex if all pairwise distances are $\leq \epsilon$ . The Čech complex includes a simplex if the balls of radius $\epsilon$ around the vertices have a common intersection point. Čech is topologically more accurate (by the Nerve Theorem) but harder to compute. Rips is always a superset of Čech and depends only on pairwise distances, making it computationally efficient.

Q3: Why is persistent homology "stable"?

Answer: The stability theorem states that $d_B(\text{pers}(f), \text{pers}(g)) \leq \|f - g\|_\infty$ , where $d_B$ is the bottleneck distance. This means if we perturb the data (or the function defining the filtration) by a small amount in the supremum norm, the persistence diagram changes by at most that amount. This guarantees that TDA is robust to noise—small perturbations cause small changes in the topological summary.

Q4: What is the Euler characteristic and how does it relate to Betti numbers?

Answer: The Euler characteristic is $\chi = \sum_{k=0}^n (-1)^k \beta_k$ . For surfaces: sphere ( $\chi = 2$ ), torus ( $\chi = 0$ ), genus- $g$ surface ( $\chi = 2 - 2g$ ). It can also be computed from any triangulation: $\chi = \alpha_0 - \alpha_1 + \alpha_2 - \cdots$ where $\alpha_k$ is the number of $k$ -simplices. It is a topological invariant (independent of triangulation) and relates geometry to topology via the Gauss-Bonnet theorem.

Q5: How can persistent homology be used as features for machine learning?

Answer: Persistence diagrams can be converted to feature vectors via: (1) Persistence landscapes—transform diagrams into functions amenable to averaging and statistics. (2) Persistence images—rasterize diagrams into fixed-size images. (3) Betti curves—plot $\beta_k(\epsilon)$ as a function of scale. (4) Persistence statistics—compute mean, variance, and other statistics of birth-death pairs. These features capture topological structure (holes, loops, voids) that traditional features miss, and have been used in shape classification, time series analysis, and molecular biology.

Q6: Why is compactness important in topology and analysis?

Answer: Compactness ensures that: (1) Every sequence has a convergent subsequence (sequential compactness), guaranteeing existence of solutions. (2) Continuous functions on compact sets are bounded and attain their extrema (extreme value theorem). (3) Open covers have finite subcovers, enabling finite-dimensional arguments. In TDA, compactness of the underlying space ensures persistence diagrams are well-defined (features have finite birth and death times). In analysis, compactness justifies convergence of iterative algorithms and existence of optimal solutions.

Quick Reference

Concept	Definition	Key Property
Topological Space	$(X, \tau)$ with open set axioms	Generalizes metric spaces
Open Set	$U \in \tau$	Closed under arbitrary union
Closed Set	$X \setminus U$ open	Contains all limit points
Connected	Cannot split into disjoint open sets	$\beta_0$ counts components
Compact	Every open cover has finite subcover	Sequentially compact in metric spaces
Homeomorphism	Continuous bijection with continuous inverse	Preserves topology
Homotopy	Continuous deformation between maps	Preserves "shape"
Simplicial Complex	Collection of simplices closed under subsets	Discrete model of topology
Betti Number	$\beta_k = \text{rank}(H_k)$	Counts $k$ -dimensional holes
Euler Characteristic	$\chi = \sum (-1)^k \beta_k$	Topological invariant
Persistent Homology	Track feature birth/death across scales	Stable under perturbation
Vietoris-Rips	$\{x_i\}$ simplex iff $d(x_i, x_j) \leq \epsilon \, \forall i,j$	Pairwise distance condition
Čech Complex	$\{x_i\}$ simplex iff balls have common intersection	Nerve Theorem applies
Bottleneck Distance	$d_B(D_1, D_2) = \inf_\gamma \sup \\|p - \gamma(p)\\|_\infty$	Stability metric

Cross-References

096-advanced-tensor-calculus — Tensor products in algebraic topology; chain complexes as tensor products
097-advanced-differential-geometry — Manifolds are topological spaces with smooth structure; de Rham cohomology relates to simplicial homology
098-advanced-functional-analysis — Function spaces are topological vector spaces; weak topologies connect analysis to topology
099-advanced-measure-theory — Borel σ-algebras are generated by open sets; measurability relates to topological structure
Graph Theory: Graphs are 1-dimensional simplicial complexes; graph connectivity relates to $\beta_0$