Markov Chains | ChatWhole Learn

Why It Matters

💡 Why It Matters

Markov chains are foundational to modern AI and machine learning. They model sequential data where the future depends only on the present—not the entire history. This memoryless property makes them computationally tractable and mathematically elegant. PageRank ranks web pages via Markov chains, Hidden Markov Models power speech recognition, and MCMC methods sample from complex distributions. Mastering Markov chains unlocks understanding of reinforcement learning (Markov Decision Processes), natural language processing, computational biology, and financial modeling.

Markov Property

DfMarkov Property

A stochastic process $\{X_n : n \geq 0\}$ satisfies the Markov property if for all states $i_0, i_1, \ldots, i_n, i_{n+1}$ and all $n \geq 0$ :

P(X_{n+1} = i_{n+1} \mid X_n = i_n, X_{n-1} = i_{n-1}, \ldots, X_0 = i_0) = P(X_{n+1} = i_{n+1} \mid X_n = i_n)

The future state depends only on the current state, not on the sequence of past states. This is the memoryless property.

ℹ️ Intuition

Think of a chess game where each move depends only on the current board position—not how you got there. Or weather forecasting: tomorrow's weather depends on today's conditions, not the entire month's history. The Markov property captures this "memoryless" behavior mathematically.

DfMarkov Chain

A Markov chain is a sequence of random variables $X_0, X_1, X_2, \ldots$ taking values in a finite or countable state space $\mathcal{S}$ , satisfying the Markov property. The chain is homogeneous (time-homogeneous) if the transition probabilities do not change with time.

Transition Matrix

DfTransition Probability

The transition probability from state $i$ to state $j$ in one step is:

p_{ij} = P(X_{n+1} = j \mid X_n = i)

For a chain with $N$ states, the transition matrix $P$ is the $N \times N$ matrix where:

P = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1N} \\ p_{21} & p_{22} & \cdots & p_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ p_{N1} & p_{N2} & \cdots & p_{NN} \end{pmatrix}

Row-Stochastic Matrix

\sum_{j=1}^{N} p_{ij} = 1 \quad \text{for all } i

Here,

$p_{ij}$ =Probability of transitioning from state i to state j
$N$ =Number of states in the state space

💡 Key Properties

Every entry $p_{ij} \geq 0$ (probabilities are non-negative)
Each row sums to 1: $\sum_j p_{ij} = 1$
$P$ is a row-stochastic matrix
The $(i,j)$ entry of $P^n$ gives the probability of going from state $i$ to state $j$ in exactly $n$ steps

📝Weather Model

Suppose weather has three states: Sunny (S), Cloudy (C), Rainy (R). The transition matrix is:

P = \begin{pmatrix} 0.6 & 0.3 & 0.1 \\ 0.3 & 0.4 & 0.3 \\ 0.2 & 0.3 & 0.5 \end{pmatrix}

Row 1: From Sunny ? 60% Sunny, 30% Cloudy, 10% Rainy
Row 2: From Cloudy ? 30% Sunny, 40% Cloudy, 30% Rainy
Row 3: From Rainy ? 20% Sunny, 30% Cloudy, 50% Rainy

State Distribution

State Distribution Evolution

\boldsymbol{\pi}^{(n)} = \boldsymbol{\pi}^{(0)} P^n

Here,

$\boldsymbol{\pi}^{(0)}$ =Initial state distribution (row vector)
$\boldsymbol{\pi}^{(n)}$ =State distribution after n steps
$P^n$ =n-step transition matrix

The state distribution at time $n$ is obtained by multiplying the initial distribution by $P^n$ . If we start in a specific state $i$ , then $\boldsymbol{\pi}^{(0)} = \mathbf{e}_i$ (the $i$ -th standard basis vector), and $\boldsymbol{\pi}^{(n)}$ is the $i$ -th row of $P^n$ .

📝Distribution Evolution

Starting from Sunny: $\boldsymbol{\pi}^{(0)} = (1, 0, 0)$ .

After 1 step: $\boldsymbol{\pi}^{(1)} = (1, 0, 0) \begin{pmatrix} 0.6 & 0.3 & 0.1 \\ 0.3 & 0.4 & 0.3 \\ 0.2 & 0.3 & 0.5 \end{pmatrix} = (0.6, 0.3, 0.1)$

After 2 steps: $\boldsymbol{\pi}^{(2)} = \boldsymbol{\pi}^{(1)} P = (0.6, 0.3, 0.1) P = (0.45, 0.33, 0.22)$

The distribution gradually "forgets" the initial state.

n-Step Transitions

Chapman-Kolmogorov Equation

P^{(m+n)}_{ij} = \sum_{k \in \mathcal{S}} P^{(m)}_{ik} \cdot P^{(n)}_{kj}

Here,

$P^{(n)}_{ij}$ =Probability of going from state i to state j in exactly n steps
$\mathcal{S}$ =State space (set of all states)

This is equivalent to matrix multiplication: $P^{m+n} = P^m \cdot P^n$ .

n-Step Transition Matrix

P^n = \underbrace{P \cdot P \cdot \cdots \cdot P}_{n \text{ times}}

Here,

$P^n$ =The n-step transition matrix
$(P^n)_{ij}$ =Probability of going from i to j in n steps

⚠️ Computing Large Powers

For large $n$ , computing $P^n$ by repeated multiplication is inefficient. Use eigendecomposition: $P = Q \Lambda Q^{-1}$ , so $P^n = Q \Lambda^n Q^{-1}$ . The diagonal matrix $\Lambda^n$ is trivial to compute. This also reveals long-term behavior through the eigenvalue 1.

Stationary Distribution

DfStationary Distribution

A probability distribution $\boldsymbol{\pi}$ over the state space is a stationary distribution (or invariant distribution) if:

\boldsymbol{\pi} = \boldsymbol{\pi} P

Equivalently, $\boldsymbol{\pi}$ is a left eigenvector of $P$ with eigenvalue 1. Once the chain reaches $\boldsymbol{\pi}$ , it remains in $\boldsymbol{\pi}$ for all future time steps.

Stationary Distribution Equations

\pi_j = \sum_{i=1}^{N} \pi_i \cdot p_{ij} \quad \text{for all } j, \qquad \sum_{j=1}^{N} \pi_j = 1

Here,

$\pi_j$ =Stationary probability of being in state j
$p_{ij}$ =Transition probability from state i to state j

📝Computing Stationary Distribution

For $P = \begin{pmatrix} 0.9 & 0.1 \\ 0.2 & 0.8 \end{pmatrix}$ :

Solve $\boldsymbol{\pi} P = \boldsymbol{\pi}$ with $\pi_1 + \pi_2 = 1$ :

$\pi_1 = 0.9\pi_1 + 0.2\pi_2$
$\pi_2 = 0.1\pi_1 + 0.8\pi_2$
$\pi_1 + \pi_2 = 1$

From the first equation: $0.1\pi_1 = 0.2\pi_2 \implies \pi_1 = 2\pi_2$

With the constraint: $2\pi_2 + \pi_2 = 1 \implies \pi_2 = 1/3, \quad \pi_1 = 2/3$

Stationary distribution: $\boldsymbol{\pi} = (2/3, 1/3)$

💡Python Verification

import numpy as np

P = np.array([[0.9, 0.1],
              [0.2, 0.8]])

# Method 1: Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(P.T)
idx = np.argmin(np.abs(eigenvalues - 1))
stationary = np.real(eigenvectors[:, idx])
stationary = stationary / stationary.sum()
print(f"Stationary distribution: {stationary}")
# Output: [0.6667 0.3333]

# Method 2: Power iteration
pi = np.array([0.5, 0.5])
for _ in range(100):
    pi = pi @ P
print(f"Stationary distribution: {pi}")
# Output: [0.6667 0.3333]

Ergodicity

ThErgodic Theorem for Markov Chains

A finite-state, time-homogeneous Markov chain is ergodic if it is:

Irreducible: Every state is reachable from every other state (the chain has a single communicating class)
Aperiodic: The period of every state is 1 (no deterministic cycling)

For an ergodic chain:

\lim_{n \to \infty} P^n_{ij} = \pi_j \quad \text{for all } i, j

The long-run distribution is independent of the initial state, and the stationary distribution exists, is unique, and is positive.

DfIrreducibility

A Markov chain is irreducible if for every pair of states $i, j$ , there exists some $n \geq 0$ such that $P^n_{ij} > 0$ . Equivalently, the chain can eventually reach any state from any other state.

DfAperiodicity

The period of a state $i$ is $d(i) = \gcd\{n \geq 1 : P^n_{ii} > 0\}$ . A chain is aperiodic if $d(i) = 1$ for all $i$ . Aperiodicity ensures the chain doesn't get trapped in deterministic cycles.

💡 Practical Implication

If a chain is ergodic, you can find the stationary distribution by simply running the chain for many steps from any starting state. The empirical distribution of visits converges to $\boldsymbol{\pi}$ . This is the basis of Markov Chain Monte Carlo (MCMC) methods.

Absorbing States

DfAbsorbing State

A state $i$ is absorbing if once entered, it is never left: $p_{ii} = 1$ . A Markov chain is absorbing if it has at least one absorbing state and it is possible to reach an absorbing state from every state.

Absorption Probability

f_{ij} = P(\text{ever reach state } j \text{ starting from } i)

Here,

$f_{ij}$ =Probability of eventually being absorbed into state j from state i

For an absorbing chain with transient states $T$ and absorbing states $A$ , the fundamental matrix is:

N = (I - Q)^{-1}

where $Q$ is the submatrix of transitions among transient states. The $(i,j)$ entry of $N$ gives the expected number of times the chain visits transient state $j$ starting from transient state $i$ before absorption.

📝Gambler's Ruin

A gambler starts with <MathBlock tex=2 and plays rounds of fair coin-flip betting. States: \ />0 (absorbing, ruined), <MathBlock tex=1, \ />2, $3 (absorbing, won). Transition matrix:

P = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ 0 & 0 & 0 & 1 \end{pmatrix}

Starting with <MathBlock tex=2, what is the probability of ruin? Using the fundamental matrix: />f_{2,0} = 0.5$.

Python Implementation

💡Complete Markov Chain Library

import numpy as np
from typing import List, Tuple, Optional

class MarkovChain:
    """A complete Markov chain implementation."""

    def __init__(self, transition_matrix: np.ndarray, state_names: Optional[List[str]] = None):
        self.P = np.array(transition_matrix, dtype=float)
        n = self.P.shape[0]
        assert self.P.shape == (n, n), "Matrix must be square"
        assert np.all(self.P >= 0), "All entries must be non-negative"
        assert np.allclose(self.P.sum(axis=1), 1), "Rows must sum to 1"
        self.n = n
        self.state_names = state_names or [str(i) for i in range(n)]

    def step(self, state: int) -> int:
        """Take one step from the given state."""
        return np.random.choice(self.n, p=self.P[state])

    def simulate(self, start: int, steps: int) -> List[int]:
        """Simulate a path of given length starting from state."""
        path = [start]
        for _ in range(steps):
            path.append(self.step(path[-1]))
        return path

    def n_step_matrix(self, n: int) -> np.ndarray:
        """Compute the n-step transition matrix P^n."""
        return np.linalg.matrix_power(self.P, n)

    def stationary_distribution(self) -> np.ndarray:
        """Find the stationary distribution via eigendecomposition."""
        eigenvalues, eigenvectors = np.linalg.eig(self.P.T)
        idx = np.argmin(np.abs(eigenvalues - 1))
        stationary = np.real(eigenvectors[:, idx])
        return stationary / stationary.sum()

    def is_irreducible(self) -> bool:
        """Check if the chain is irreducible."""
        reach = np.linalg.matrix_power(self.P, self.n - 1)
        return np.all(reach > 0)

    def is_aperiodic(self) -> bool:
        """Check if the chain is aperiodic."""
        from math import gcd
        for i in range(self.n):
            periods = []
            P_n = np.eye(self.n)
            for k in range(1, self.n * self.n + 1):
                P_n = P_n @ self.P
                if P_n[i, i] > 0:
                    periods.append(k)
            if periods:
                g = periods[0]
                for p in periods[1:]:
                    g = gcd(g, p)
                if g > 1:
                    return False
        return True

    def is_ergodic(self) -> bool:
        """Check if the chain is ergodic (irreducible + aperiodic)."""
        return self.is_irreducible() and self.is_aperiodic()

    def absorbing_states(self) -> List[int]:
        """Find all absorbing states."""
        return [i for i in range(self.n) if self.P[i, i] == 1.0]

    def fundamental_matrix(self) -> np.ndarray:
        """Compute the fundamental matrix for absorbing chains."""
        absorbing = self.absorbing_states()
        transient = [i for i in range(self.n) if i not in absorbing]
        if not transient:
            return np.eye(0)
        Q = self.P[np.ix_(transient, transient)]
        return np.linalg.inv(np.eye(len(transient)) - Q)

    def steady_state_distribution(self, start: int, steps: int = 10000) -> np.ndarray:
        """Estimate stationary distribution via simulation."""
        counts = np.zeros(self.n)
        state = start
        for _ in range(steps):
            state = self.step(state)
            counts[state] += 1
        return counts / counts.sum()


# === Example Usage ===

# Weather model
P_weather = [[0.6, 0.3, 0.1],
             [0.3, 0.4, 0.3],
             [0.2, 0.3, 0.5]]

mc = MarkovChain(P_weather, ["Sunny", "Cloudy", "Rainy"])

print("Stationary distribution:", mc.stationary_distribution())
print("Is ergodic:", mc.is_ergodic())
print("10-step transitions:\n", mc.n_step_matrix(10))

# Simulate 30 days
path = mc.simulate(0, 30)  # Start from Sunny
weather_counts = [path.count(i) for i in range(3)]
print("Weather distribution:", [c/len(path) for c in weather_counts])

Applications in AI/ML

PageRank

PageRank Formula

\vec{r} = (1 - d)\frac{\vec{1}}{n} + d \cdot M\vec{r}

Here,

$\vec{r}$ =PageRank vector (stationary distribution)
$d$ =Damping factor (typically 0.85)
$M$ =Column-stochastic web graph matrix
$n$ =Total number of pages

ℹ️ How PageRank Works

The web is modeled as a directed graph where pages are states and links are transitions. The damping factor $d = 0.85$ means: 85% of the time, follow a random link; 15% of the time, jump to a random page (teleportation). This ensures the chain is ergodic. The PageRank vector is the stationary distribution—pages with higher probability are more "important."

def pagerank(adjacency: np.ndarray, d: float = 0.85, max_iter: int = 100) -> np.ndarray:
    """Compute PageRank via power iteration."""
    n = adjacency.shape[0]
    col_sums = adjacency.sum(axis=0)
    col_sums[col_sums == 0] = 1  # Handle dangling nodes
    M = adjacency / col_sums
    r = np.ones(n) / n
    for _ in range(max_iter):
        r_new = (1 - d) / n + d * M @ r
        if np.allclose(r, r_new, atol=1e-8):
            break
        r = r_new
    return r

# Example: 4-page web
links = np.array([[0, 1, 0, 1],
                  [1, 0, 1, 0],
                  [0, 1, 0, 1],
                  [1, 0, 1, 0]])
print("PageRank:", pagerank(links))

Markov Chain Monte Carlo (MCMC)

DfMCMC Sampling

Markov Chain Monte Carlo methods construct a Markov chain whose stationary distribution is the target distribution $\pi$ . By running the chain long enough, samples approximate $\pi$ . The Metropolis-Hastings algorithm is the most general MCMC method.

def metropolis_hastings(target_log_prob, proposal, n_samples, x0):
    """Generic Metropolis-Hastings sampler."""
    samples = [x0]
    x = x0
    for _ in range(n_samples):
        x_prop = proposal(x)
        log_alpha = target_log_prob(x_prop) - target_log_prob(x)
        if np.log(np.random.uniform()) < log_alpha:
            x = x_prop
        samples.append(x)
    return np.array(samples)

# Example: Sample from N(0,1)
target = lambda x: -0.5 * x**2
proposal = lambda x: x + np.random.normal(0, 0.5)
samples = metropolis_hastings(target, proposal, 10000, 0.0)
print(f"Mean: {samples.mean():.4f}, Std: {samples.std():.4f}")

Hidden Markov Models (HMMs)

DfHidden Markov Model

A Hidden Markov Model consists of:

A Markov chain of hidden states $Z_1, Z_2, \ldots$ with transition matrix $A$
Observable emissions $X_t$ where $P(X_t = x \mid Z_t = s) = B_s(x)$
Initial distribution $\pi_0$ over hidden states

The joint probability is:

P(Z_{1:T}, X_{1:T}) = \pi_0(Z_1) \prod_{t=1}^{T-1} A_{Z_t, Z_{t+1}} \prod_{t=1}^{T} B_{Z_t}(X_t)

ℹ️ HMM Applications

Speech recognition: Hidden states = phonemes, observations = audio features
Bioinformatics: Hidden states = gene types, observations = DNA bases
Part-of-speech tagging: Hidden states = POS tags, observations = words
Gesture recognition: Hidden states = gesture phases, observations = sensor data

Forward Algorithm

\alpha_t(j) = P(X_1, \ldots, X_t, Z_t = j)

Here,

$\alpha_t(j)$ =Forward probability of being in state j at time t
$B_j(X_t)$ =Emission probability of observation X_t from state j

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Assuming rows sum to 1 for column-stochastic matrices	PageRank uses column-stochastic matrices; rows may not sum to 1	Check the convention; normalize columns for PageRank
Confusing left and right eigenvectors	Stationary distribution is a left eigenvector: $\boldsymbol{\pi} = \boldsymbol{\pi} P$	Solve $\boldsymbol{\pi} P = \boldsymbol{\pi}$ , not $P \mathbf{v} = \mathbf{v}$
Forgetting the normalization constraint	The stationary distribution must satisfy $\sum_j \pi_j = 1$	Always normalize the eigenvector to sum to 1
Assuming all chains have a stationary distribution	Periodic or reducible chains may not converge	Verify ergodicity (irreducible + aperiodic) first
Ignoring dangling nodes in PageRank	Pages with no outgoing links break the Markov chain	Add teleportation or connect dangling nodes to all pages
Using power iteration without convergence check	May run too many or too few iterations	Check $\\|\boldsymbol{\pi}^{(k+1)} - \boldsymbol{\pi}^{(k)}\\| < \epsilon$
Confusing transient with absorbing states	Not all states with $p_{ii} > 0$ are absorbing	An absorbing state requires $p_{ii} = 1$ exactly

Interview Questions

ℹ️ Common Interview Questions

What is the Markov property? Give a real-world example. The future depends only on the current state. Example: stock price tomorrow depends only on today's price, not its history.
How do you find the stationary distribution of a Markov chain? Solve $\boldsymbol{\pi} P = \boldsymbol{\pi}$ with $\sum \pi_j = 1$ . Use eigendecomposition of $P^T$ or power iteration.
What conditions guarantee convergence to a stationary distribution? The chain must be ergodic: irreducible and aperiodic. For finite chains, this ensures a unique stationary distribution.
Explain how PageRank works as a Markov chain. Each web page is a state; links define transitions. The damping factor adds teleportation for ergodicity. PageRank is the stationary distribution.
What is the fundamental matrix in an absorbing chain? $N = (I - Q)^{-1}$ where $Q$ is the transient-to-transient transition submatrix. $N_{ij}$ = expected visits to transient state $j$ from state $i$ before absorption.
How does MCMC use Markov chains? Construct a chain whose stationary distribution is the target distribution. After a burn-in period, samples approximate the target.
What is the difference between time-homogeneous and time-inhomogeneous chains? Homogeneous: transition probabilities are constant over time. Inhomogeneous: they change with time $P_t$ varies with $t$ .
How do you check if a chain is irreducible? Compute $P^n$ for $n = |S| - 1$ . If all entries are positive, the chain is irreducible.

Practice Problems

📝Problem 1: Two-State Chain

Given $P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}$ , find the stationary distribution and the 3-step transition matrix.

💡Solution 1

Stationary distribution: Solve $\pi P = \pi$ with $\pi_1 + \pi_2 = 1$ :

$\pi_1 = 0.7\pi_1 + 0.4\pi_2 \implies 0.3\pi_1 = 0.4\pi_2 \implies \pi_1 = \frac{4}{3}\pi_2$
$\frac{4}{3}\pi_2 + \pi_2 = 1 \implies \frac{7}{3}\pi_2 = 1 \implies \pi_2 = \frac{3}{7}$
$\boldsymbol{\pi} = (4/7, 3/7) \approx (0.571, 0.429)$

3-step transition: $P^3 = P \cdot P \cdot P = \begin{pmatrix} 0.583 & 0.417 \\ 0.556 & 0.444 \end{pmatrix}$

📝Problem 2: Absorbing Chain

A rat moves in a 4-cell maze: cells 1, 2, 3 are transient; cell 4 is absorbing (trap). Transitions: from cell 1, equal probability to 2 or 3. From cell 2, equal probability to 1 or 4. From cell 3, equal probability to 1 or 4. Find the probability of reaching cell 4 from cell 1.

💡Solution 2

Transition matrix (states 1, 2, 3, 4):

P = \begin{pmatrix} 0 & 0.5 & 0.5 & 0 \\ 0.5 & 0 & 0 & 0.5 \\ 0.5 & 0 & 0 & 0.5 \\ 0 & 0 & 0 & 1 \end{pmatrix}

$Q = \begin{pmatrix} 0 & 0.5 & 0.5 \\ 0.5 & 0 & 0 \\ 0.5 & 0 & 0 \end{pmatrix}$ , $R = \begin{pmatrix} 0 \\ 0.5 \\ 0.5 \end{pmatrix}$

N = (I - Q)^{-1} = \begin{pmatrix} 1.5 & 0.5 & 0.5 \\ 0.75 & 1.25 & 0.25 \\ 0.75 & 0.25 & 1.25 \end{pmatrix}

B = NR = \begin{pmatrix} 0.5 \\ 0.5 \\ 0.5 \end{pmatrix}

Probability of absorption into cell 4 from cell 1: $0.5$

📝Problem 3: Random Walk

A symmetric random walk on states $\{0, 1, 2, 3\}$ has reflecting boundaries: $p_{0,1} = 1$ and $p_{3,2} = 1$ . Find the stationary distribution.

💡Solution 3

Transition matrix:

P = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ 0 & 0 & 1 & 0 \end{pmatrix}

Solving $\pi P = \pi$ : By symmetry, $\pi_0 = \pi_3$ and $\pi_1 = \pi_2$ . From balance equations: $\pi_0 = 0.5\pi_1$ and $\pi_1 = \pi_0 + 0.5\pi_2 = \pi_0 + 0.5\pi_1$ , so $\pi_1 = 2\pi_0$ .

Normalization: $\pi_0 + 2\pi_0 + 2\pi_0 + \pi_0 = 1 \implies 6\pi_0 = 1$

$\boldsymbol{\pi} = (1/6, 1/3, 1/3, 1/6)$

📝Problem 4: PageRank by Hand

Three pages A, B, C with links: A?B, A?C, B?C, C?A. Compute PageRank with $d = 0.85$ .

💡Solution 4

Adjacency matrix: $L = \begin{pmatrix} 0 & 1 & 1 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}$

Column-stochastic: $M = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}$

PageRank equation: $\vec{r} = 0.05 \cdot \frac{1}{3}\vec{1} + 0.85 \cdot M\vec{r}$

Solving: $\vec{r} = (1/3, 1/3, 1/3)$ (uniform due to the cycle A?B?C?A)

Quick Reference

📋Markov Chains Quick Reference

Markov Property: $P(X_{n+1} \mid X_n, \ldots, X_0) = P(X_{n+1} \mid X_n)$
Transition Matrix: $p_{ij} = P(X_{n+1} = j \mid X_n = i)$ , rows sum to 1
n-Step: $P^n_{ij}$ = probability of $i \to j$ in $n$ steps; $P^n = P \cdot P \cdots P$
Chapman-Kolmogorov: $P^{m+n} = P^m \cdot P^n$
Stationary Distribution: $\boldsymbol{\pi} P = \boldsymbol{\pi}$ , $\sum \pi_j = 1$
Ergodicity: Irreducible + Aperiodic ? unique stationary distribution, convergence from any start
Irreducible: All states communicate; $P^n_{ij} > 0$ for large enough $n$
Aperiodic: $\gcd\{n : P^n_{ii} > 0\} = 1$
Absorbing State: $p_{ii} = 1$ ; once entered, never left
Fundamental Matrix: $N = (I - Q)^{-1}$ for absorbing chains
PageRank: $\vec{r} = (1-d)\frac{\vec{1}}{n} + dM\vec{r}$ , $d \approx 0.85$
MCMC: Construct chain with target as stationary distribution; sample after burn-in
HMMs: Hidden Markov chain + emissions; use forward/backward/Viterbi algorithms
Python: np.linalg.eig(P.T) for stationary; np.linalg.matrix_power(P, n) for $P^n$

Cross-References

ℹ️ Related Topics

Probability Foundations — Prerequisites: sample spaces, conditional probability
Bayesian Inference — Posterior updating uses similar matrix computations
Linear Algebra — Eigendecomposition, matrix powers, and stochastic matrices
Optimization — PageRank is an eigenvalue problem; MCMC is optimization-adjacent
Reinforcement Learning — Markov Decision Processes extend Markov chains with actions and rewards
Information Theory — Entropy and KL divergence relate to MCMC convergence diagnostics