← Math|43 of 100
Probability

Markov Chains

Understand Markov chains, transition matrices, stationary distributions, and PageRank.

📂 Stochastic Processes📖 Lesson 43 of 100🎓 Free Course

Advertisement

Why It Matters

💡 Why It Matters

Markov chains are foundational to modern AI and machine learning. They model sequential data where the future depends only on the present—not the entire history. This memoryless property makes them computationally tractable and mathematically elegant. PageRank ranks web pages via Markov chains, Hidden Markov Models power speech recognition, and MCMC methods sample from complex distributions. Mastering Markov chains unlocks understanding of reinforcement learning (Markov Decision Processes), natural language processing, computational biology, and financial modeling.


Markov Property

DfMarkov Property

A stochastic process {Xn:n0}\{X_n : n \geq 0\} satisfies the Markov property if for all states i0,i1,,in,in+1i_0, i_1, \ldots, i_n, i_{n+1} and all n0n \geq 0:

P(Xn+1=in+1Xn=in,Xn1=in1,,X0=i0)=P(Xn+1=in+1Xn=in)P(X_{n+1} = i_{n+1} \mid X_n = i_n, X_{n-1} = i_{n-1}, \ldots, X_0 = i_0) = P(X_{n+1} = i_{n+1} \mid X_n = i_n)

The future state depends only on the current state, not on the sequence of past states. This is the memoryless property.

ℹ️ Intuition

Think of a chess game where each move depends only on the current board position—not how you got there. Or weather forecasting: tomorrow's weather depends on today's conditions, not the entire month's history. The Markov property captures this "memoryless" behavior mathematically.

DfMarkov Chain

A Markov chain is a sequence of random variables X0,X1,X2,X_0, X_1, X_2, \ldots taking values in a finite or countable state space S\mathcal{S}, satisfying the Markov property. The chain is homogeneous (time-homogeneous) if the transition probabilities do not change with time.


Transition Matrix

DfTransition Probability

The transition probability from state ii to state jj in one step is:

pij=P(Xn+1=jXn=i)p_{ij} = P(X_{n+1} = j \mid X_n = i)

For a chain with NN states, the transition matrix PP is the N×NN \times N matrix where:

P=(p11p12p1Np21p22p2NpN1pN2pNN)P = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1N} \\ p_{21} & p_{22} & \cdots & p_{2N} \\ \vdots & \vdots & \ddots & \vdots \\ p_{N1} & p_{N2} & \cdots & p_{NN} \end{pmatrix}

Row-Stochastic Matrix

j=1Npij=1for all i\sum_{j=1}^{N} p_{ij} = 1 \quad \text{for all } i

Here,

  • pijp_{ij}=Probability of transitioning from state i to state j
  • NN=Number of states in the state space

💡 Key Properties

  • Every entry pij0p_{ij} \geq 0 (probabilities are non-negative)
  • Each row sums to 1: jpij=1\sum_j p_{ij} = 1
  • PP is a row-stochastic matrix
  • The (i,j)(i,j) entry of PnP^n gives the probability of going from state ii to state jj in exactly nn steps

📝Weather Model

Suppose weather has three states: Sunny (S), Cloudy (C), Rainy (R). The transition matrix is:

P=(0.60.30.10.30.40.30.20.30.5)P = \begin{pmatrix} 0.6 & 0.3 & 0.1 \\ 0.3 & 0.4 & 0.3 \\ 0.2 & 0.3 & 0.5 \end{pmatrix}
  • Row 1: From Sunny ? 60% Sunny, 30% Cloudy, 10% Rainy
  • Row 2: From Cloudy ? 30% Sunny, 40% Cloudy, 30% Rainy
  • Row 3: From Rainy ? 20% Sunny, 30% Cloudy, 50% Rainy

State Distribution

State Distribution Evolution

π(n)=π(0)Pn\boldsymbol{\pi}^{(n)} = \boldsymbol{\pi}^{(0)} P^n

Here,

  • π(0)\boldsymbol{\pi}^{(0)}=Initial state distribution (row vector)
  • π(n)\boldsymbol{\pi}^{(n)}=State distribution after n steps
  • PnP^n=n-step transition matrix

The state distribution at time nn is obtained by multiplying the initial distribution by PnP^n. If we start in a specific state ii, then π(0)=ei\boldsymbol{\pi}^{(0)} = \mathbf{e}_i (the ii-th standard basis vector), and π(n)\boldsymbol{\pi}^{(n)} is the ii-th row of PnP^n.

📝Distribution Evolution

Starting from Sunny: π(0)=(1,0,0)\boldsymbol{\pi}^{(0)} = (1, 0, 0).

After 1 step: π(1)=(1,0,0)(0.60.30.10.30.40.30.20.30.5)=(0.6,0.3,0.1)\boldsymbol{\pi}^{(1)} = (1, 0, 0) \begin{pmatrix} 0.6 & 0.3 & 0.1 \\ 0.3 & 0.4 & 0.3 \\ 0.2 & 0.3 & 0.5 \end{pmatrix} = (0.6, 0.3, 0.1)

After 2 steps: π(2)=π(1)P=(0.6,0.3,0.1)P=(0.45,0.33,0.22)\boldsymbol{\pi}^{(2)} = \boldsymbol{\pi}^{(1)} P = (0.6, 0.3, 0.1) P = (0.45, 0.33, 0.22)

The distribution gradually "forgets" the initial state.


n-Step Transitions

Chapman-Kolmogorov Equation

Pij(m+n)=kSPik(m)Pkj(n)P^{(m+n)}_{ij} = \sum_{k \in \mathcal{S}} P^{(m)}_{ik} \cdot P^{(n)}_{kj}

Here,

  • Pij(n)P^{(n)}_{ij}=Probability of going from state i to state j in exactly n steps
  • S\mathcal{S}=State space (set of all states)

This is equivalent to matrix multiplication: Pm+n=PmPnP^{m+n} = P^m \cdot P^n.

n-Step Transition Matrix

Pn=PPPn timesP^n = \underbrace{P \cdot P \cdot \cdots \cdot P}_{n \text{ times}}

Here,

  • PnP^n=The n-step transition matrix
  • (Pn)ij(P^n)_{ij}=Probability of going from i to j in n steps

⚠️ Computing Large Powers

For large nn, computing PnP^n by repeated multiplication is inefficient. Use eigendecomposition: P=QΛQ1P = Q \Lambda Q^{-1}, so Pn=QΛnQ1P^n = Q \Lambda^n Q^{-1}. The diagonal matrix Λn\Lambda^n is trivial to compute. This also reveals long-term behavior through the eigenvalue 1.


Stationary Distribution

DfStationary Distribution

A probability distribution π\boldsymbol{\pi} over the state space is a stationary distribution (or invariant distribution) if:

π=πP\boldsymbol{\pi} = \boldsymbol{\pi} P

Equivalently, π\boldsymbol{\pi} is a left eigenvector of PP with eigenvalue 1. Once the chain reaches π\boldsymbol{\pi}, it remains in π\boldsymbol{\pi} for all future time steps.

Stationary Distribution Equations

πj=i=1Nπipijfor all j,j=1Nπj=1\pi_j = \sum_{i=1}^{N} \pi_i \cdot p_{ij} \quad \text{for all } j, \qquad \sum_{j=1}^{N} \pi_j = 1

Here,

  • πj\pi_j=Stationary probability of being in state j
  • pijp_{ij}=Transition probability from state i to state j

📝Computing Stationary Distribution

For P=(0.90.10.20.8)P = \begin{pmatrix} 0.9 & 0.1 \\ 0.2 & 0.8 \end{pmatrix}:

Solve πP=π\boldsymbol{\pi} P = \boldsymbol{\pi} with π1+π2=1\pi_1 + \pi_2 = 1:

  • π1=0.9π1+0.2π2\pi_1 = 0.9\pi_1 + 0.2\pi_2
  • π2=0.1π1+0.8π2\pi_2 = 0.1\pi_1 + 0.8\pi_2
  • π1+π2=1\pi_1 + \pi_2 = 1

From the first equation: 0.1π1=0.2π2    π1=2π20.1\pi_1 = 0.2\pi_2 \implies \pi_1 = 2\pi_2

With the constraint: 2π2+π2=1    π2=1/3,π1=2/32\pi_2 + \pi_2 = 1 \implies \pi_2 = 1/3, \quad \pi_1 = 2/3

Stationary distribution: π=(2/3,1/3)\boldsymbol{\pi} = (2/3, 1/3)

💡Python Verification

import numpy as np

P = np.array([[0.9, 0.1],
              [0.2, 0.8]])

# Method 1: Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(P.T)
idx = np.argmin(np.abs(eigenvalues - 1))
stationary = np.real(eigenvectors[:, idx])
stationary = stationary / stationary.sum()
print(f"Stationary distribution: {stationary}")
# Output: [0.6667 0.3333]

# Method 2: Power iteration
pi = np.array([0.5, 0.5])
for _ in range(100):
    pi = pi @ P
print(f"Stationary distribution: {pi}")
# Output: [0.6667 0.3333]

Ergodicity

ThErgodic Theorem for Markov Chains

A finite-state, time-homogeneous Markov chain is ergodic if it is:

  1. Irreducible: Every state is reachable from every other state (the chain has a single communicating class)
  2. Aperiodic: The period of every state is 1 (no deterministic cycling)

For an ergodic chain:

limnPijn=πjfor all i,j\lim_{n \to \infty} P^n_{ij} = \pi_j \quad \text{for all } i, j

The long-run distribution is independent of the initial state, and the stationary distribution exists, is unique, and is positive.

DfIrreducibility

A Markov chain is irreducible if for every pair of states i,ji, j, there exists some n0n \geq 0 such that Pijn>0P^n_{ij} > 0. Equivalently, the chain can eventually reach any state from any other state.

DfAperiodicity

The period of a state ii is d(i)=gcd{n1:Piin>0}d(i) = \gcd\{n \geq 1 : P^n_{ii} > 0\}. A chain is aperiodic if d(i)=1d(i) = 1 for all ii. Aperiodicity ensures the chain doesn't get trapped in deterministic cycles.

💡 Practical Implication

If a chain is ergodic, you can find the stationary distribution by simply running the chain for many steps from any starting state. The empirical distribution of visits converges to π\boldsymbol{\pi}. This is the basis of Markov Chain Monte Carlo (MCMC) methods.


Absorbing States

DfAbsorbing State

A state ii is absorbing if once entered, it is never left: pii=1p_{ii} = 1. A Markov chain is absorbing if it has at least one absorbing state and it is possible to reach an absorbing state from every state.

Absorption Probability

fij=P(ever reach state j starting from i)f_{ij} = P(\text{ever reach state } j \text{ starting from } i)

Here,

  • fijf_{ij}=Probability of eventually being absorbed into state j from state i

For an absorbing chain with transient states TT and absorbing states AA, the fundamental matrix is:

N=(IQ)1N = (I - Q)^{-1}

where QQ is the submatrix of transitions among transient states. The (i,j)(i,j) entry of NN gives the expected number of times the chain visits transient state jj starting from transient state ii before absorption.

📝Gambler's Ruin

A gambler starts with <MathBlock tex=2 and plays rounds of fair coin-flip betting. States: \ />0 (absorbing, ruined), <MathBlock tex=1, \ />2, $3 (absorbing, won). Transition matrix:

P=(10000.500.5000.500.50001)P = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ 0 & 0 & 0 & 1 \end{pmatrix}

Starting with <MathBlock tex=2, what is the probability of ruin? Using the fundamental matrix: />f_{2,0} = 0.5$.


Python Implementation

💡Complete Markov Chain Library

import numpy as np
from typing import List, Tuple, Optional

class MarkovChain:
    """A complete Markov chain implementation."""

    def __init__(self, transition_matrix: np.ndarray, state_names: Optional[List[str]] = None):
        self.P = np.array(transition_matrix, dtype=float)
        n = self.P.shape[0]
        assert self.P.shape == (n, n), "Matrix must be square"
        assert np.all(self.P >= 0), "All entries must be non-negative"
        assert np.allclose(self.P.sum(axis=1), 1), "Rows must sum to 1"
        self.n = n
        self.state_names = state_names or [str(i) for i in range(n)]

    def step(self, state: int) -> int:
        """Take one step from the given state."""
        return np.random.choice(self.n, p=self.P[state])

    def simulate(self, start: int, steps: int) -> List[int]:
        """Simulate a path of given length starting from state."""
        path = [start]
        for _ in range(steps):
            path.append(self.step(path[-1]))
        return path

    def n_step_matrix(self, n: int) -> np.ndarray:
        """Compute the n-step transition matrix P^n."""
        return np.linalg.matrix_power(self.P, n)

    def stationary_distribution(self) -> np.ndarray:
        """Find the stationary distribution via eigendecomposition."""
        eigenvalues, eigenvectors = np.linalg.eig(self.P.T)
        idx = np.argmin(np.abs(eigenvalues - 1))
        stationary = np.real(eigenvectors[:, idx])
        return stationary / stationary.sum()

    def is_irreducible(self) -> bool:
        """Check if the chain is irreducible."""
        reach = np.linalg.matrix_power(self.P, self.n - 1)
        return np.all(reach > 0)

    def is_aperiodic(self) -> bool:
        """Check if the chain is aperiodic."""
        from math import gcd
        for i in range(self.n):
            periods = []
            P_n = np.eye(self.n)
            for k in range(1, self.n * self.n + 1):
                P_n = P_n @ self.P
                if P_n[i, i] > 0:
                    periods.append(k)
            if periods:
                g = periods[0]
                for p in periods[1:]:
                    g = gcd(g, p)
                if g > 1:
                    return False
        return True

    def is_ergodic(self) -> bool:
        """Check if the chain is ergodic (irreducible + aperiodic)."""
        return self.is_irreducible() and self.is_aperiodic()

    def absorbing_states(self) -> List[int]:
        """Find all absorbing states."""
        return [i for i in range(self.n) if self.P[i, i] == 1.0]

    def fundamental_matrix(self) -> np.ndarray:
        """Compute the fundamental matrix for absorbing chains."""
        absorbing = self.absorbing_states()
        transient = [i for i in range(self.n) if i not in absorbing]
        if not transient:
            return np.eye(0)
        Q = self.P[np.ix_(transient, transient)]
        return np.linalg.inv(np.eye(len(transient)) - Q)

    def steady_state_distribution(self, start: int, steps: int = 10000) -> np.ndarray:
        """Estimate stationary distribution via simulation."""
        counts = np.zeros(self.n)
        state = start
        for _ in range(steps):
            state = self.step(state)
            counts[state] += 1
        return counts / counts.sum()


# === Example Usage ===

# Weather model
P_weather = [[0.6, 0.3, 0.1],
             [0.3, 0.4, 0.3],
             [0.2, 0.3, 0.5]]

mc = MarkovChain(P_weather, ["Sunny", "Cloudy", "Rainy"])

print("Stationary distribution:", mc.stationary_distribution())
print("Is ergodic:", mc.is_ergodic())
print("10-step transitions:\n", mc.n_step_matrix(10))

# Simulate 30 days
path = mc.simulate(0, 30)  # Start from Sunny
weather_counts = [path.count(i) for i in range(3)]
print("Weather distribution:", [c/len(path) for c in weather_counts])

Applications in AI/ML

PageRank

PageRank Formula

r=(1d)1n+dMr\vec{r} = (1 - d)\frac{\vec{1}}{n} + d \cdot M\vec{r}

Here,

  • r\vec{r}=PageRank vector (stationary distribution)
  • dd=Damping factor (typically 0.85)
  • MM=Column-stochastic web graph matrix
  • nn=Total number of pages

ℹ️ How PageRank Works

The web is modeled as a directed graph where pages are states and links are transitions. The damping factor d=0.85d = 0.85 means: 85% of the time, follow a random link; 15% of the time, jump to a random page (teleportation). This ensures the chain is ergodic. The PageRank vector is the stationary distribution—pages with higher probability are more "important."

def pagerank(adjacency: np.ndarray, d: float = 0.85, max_iter: int = 100) -> np.ndarray:
    """Compute PageRank via power iteration."""
    n = adjacency.shape[0]
    col_sums = adjacency.sum(axis=0)
    col_sums[col_sums == 0] = 1  # Handle dangling nodes
    M = adjacency / col_sums
    r = np.ones(n) / n
    for _ in range(max_iter):
        r_new = (1 - d) / n + d * M @ r
        if np.allclose(r, r_new, atol=1e-8):
            break
        r = r_new
    return r

# Example: 4-page web
links = np.array([[0, 1, 0, 1],
                  [1, 0, 1, 0],
                  [0, 1, 0, 1],
                  [1, 0, 1, 0]])
print("PageRank:", pagerank(links))

Markov Chain Monte Carlo (MCMC)

DfMCMC Sampling

Markov Chain Monte Carlo methods construct a Markov chain whose stationary distribution is the target distribution π\pi. By running the chain long enough, samples approximate π\pi. The Metropolis-Hastings algorithm is the most general MCMC method.

def metropolis_hastings(target_log_prob, proposal, n_samples, x0):
    """Generic Metropolis-Hastings sampler."""
    samples = [x0]
    x = x0
    for _ in range(n_samples):
        x_prop = proposal(x)
        log_alpha = target_log_prob(x_prop) - target_log_prob(x)
        if np.log(np.random.uniform()) < log_alpha:
            x = x_prop
        samples.append(x)
    return np.array(samples)

# Example: Sample from N(0,1)
target = lambda x: -0.5 * x**2
proposal = lambda x: x + np.random.normal(0, 0.5)
samples = metropolis_hastings(target, proposal, 10000, 0.0)
print(f"Mean: {samples.mean():.4f}, Std: {samples.std():.4f}")

Hidden Markov Models (HMMs)

DfHidden Markov Model

A Hidden Markov Model consists of:

  • A Markov chain of hidden states Z1,Z2,Z_1, Z_2, \ldots with transition matrix AA
  • Observable emissions XtX_t where P(Xt=xZt=s)=Bs(x)P(X_t = x \mid Z_t = s) = B_s(x)
  • Initial distribution π0\pi_0 over hidden states

The joint probability is:

P(Z1:T,X1:T)=π0(Z1)t=1T1AZt,Zt+1t=1TBZt(Xt)P(Z_{1:T}, X_{1:T}) = \pi_0(Z_1) \prod_{t=1}^{T-1} A_{Z_t, Z_{t+1}} \prod_{t=1}^{T} B_{Z_t}(X_t)

ℹ️ HMM Applications

  • Speech recognition: Hidden states = phonemes, observations = audio features
  • Bioinformatics: Hidden states = gene types, observations = DNA bases
  • Part-of-speech tagging: Hidden states = POS tags, observations = words
  • Gesture recognition: Hidden states = gesture phases, observations = sensor data

Forward Algorithm

αt(j)=P(X1,,Xt,Zt=j)\alpha_t(j) = P(X_1, \ldots, X_t, Z_t = j)

Here,

  • αt(j)\alpha_t(j)=Forward probability of being in state j at time t
  • Bj(Xt)B_j(X_t)=Emission probability of observation X_t from state j

Common Mistakes

MistakeWhy It's WrongCorrect Approach
Assuming rows sum to 1 for column-stochastic matricesPageRank uses column-stochastic matrices; rows may not sum to 1Check the convention; normalize columns for PageRank
Confusing left and right eigenvectorsStationary distribution is a left eigenvector: π=πP\boldsymbol{\pi} = \boldsymbol{\pi} PSolve πP=π\boldsymbol{\pi} P = \boldsymbol{\pi}, not Pv=vP \mathbf{v} = \mathbf{v}
Forgetting the normalization constraintThe stationary distribution must satisfy jπj=1\sum_j \pi_j = 1Always normalize the eigenvector to sum to 1
Assuming all chains have a stationary distributionPeriodic or reducible chains may not convergeVerify ergodicity (irreducible + aperiodic) first
Ignoring dangling nodes in PageRankPages with no outgoing links break the Markov chainAdd teleportation or connect dangling nodes to all pages
Using power iteration without convergence checkMay run too many or too few iterationsCheck π(k+1)π(k)<ϵ\|\boldsymbol{\pi}^{(k+1)} - \boldsymbol{\pi}^{(k)}\| < \epsilon
Confusing transient with absorbing statesNot all states with pii>0p_{ii} > 0 are absorbingAn absorbing state requires pii=1p_{ii} = 1 exactly

Interview Questions

ℹ️ Common Interview Questions

  1. What is the Markov property? Give a real-world example. The future depends only on the current state. Example: stock price tomorrow depends only on today's price, not its history.

  2. How do you find the stationary distribution of a Markov chain? Solve πP=π\boldsymbol{\pi} P = \boldsymbol{\pi} with πj=1\sum \pi_j = 1. Use eigendecomposition of PTP^T or power iteration.

  3. What conditions guarantee convergence to a stationary distribution? The chain must be ergodic: irreducible and aperiodic. For finite chains, this ensures a unique stationary distribution.

  4. Explain how PageRank works as a Markov chain. Each web page is a state; links define transitions. The damping factor adds teleportation for ergodicity. PageRank is the stationary distribution.

  5. What is the fundamental matrix in an absorbing chain? N=(IQ)1N = (I - Q)^{-1} where QQ is the transient-to-transient transition submatrix. NijN_{ij} = expected visits to transient state jj from state ii before absorption.

  6. How does MCMC use Markov chains? Construct a chain whose stationary distribution is the target distribution. After a burn-in period, samples approximate the target.

  7. What is the difference between time-homogeneous and time-inhomogeneous chains? Homogeneous: transition probabilities are constant over time. Inhomogeneous: they change with time PtP_t varies with tt.

  8. How do you check if a chain is irreducible? Compute PnP^n for n=S1n = |S| - 1. If all entries are positive, the chain is irreducible.


Practice Problems

📝Problem 1: Two-State Chain

Given P=(0.70.30.40.6)P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}, find the stationary distribution and the 3-step transition matrix.

💡Solution 1

Stationary distribution: Solve πP=π\pi P = \pi with π1+π2=1\pi_1 + \pi_2 = 1:

  • π1=0.7π1+0.4π2    0.3π1=0.4π2    π1=43π2\pi_1 = 0.7\pi_1 + 0.4\pi_2 \implies 0.3\pi_1 = 0.4\pi_2 \implies \pi_1 = \frac{4}{3}\pi_2
  • 43π2+π2=1    73π2=1    π2=37\frac{4}{3}\pi_2 + \pi_2 = 1 \implies \frac{7}{3}\pi_2 = 1 \implies \pi_2 = \frac{3}{7}
  • π=(4/7,3/7)(0.571,0.429)\boldsymbol{\pi} = (4/7, 3/7) \approx (0.571, 0.429)

3-step transition: P3=PPP=(0.5830.4170.5560.444)P^3 = P \cdot P \cdot P = \begin{pmatrix} 0.583 & 0.417 \\ 0.556 & 0.444 \end{pmatrix}

📝Problem 2: Absorbing Chain

A rat moves in a 4-cell maze: cells 1, 2, 3 are transient; cell 4 is absorbing (trap). Transitions: from cell 1, equal probability to 2 or 3. From cell 2, equal probability to 1 or 4. From cell 3, equal probability to 1 or 4. Find the probability of reaching cell 4 from cell 1.

💡Solution 2

Transition matrix (states 1, 2, 3, 4):

P=(00.50.500.5000.50.5000.50001)P = \begin{pmatrix} 0 & 0.5 & 0.5 & 0 \\ 0.5 & 0 & 0 & 0.5 \\ 0.5 & 0 & 0 & 0.5 \\ 0 & 0 & 0 & 1 \end{pmatrix}

Q=(00.50.50.5000.500)Q = \begin{pmatrix} 0 & 0.5 & 0.5 \\ 0.5 & 0 & 0 \\ 0.5 & 0 & 0 \end{pmatrix}, R=(00.50.5)R = \begin{pmatrix} 0 \\ 0.5 \\ 0.5 \end{pmatrix}

N=(IQ)1=(1.50.50.50.751.250.250.750.251.25)N = (I - Q)^{-1} = \begin{pmatrix} 1.5 & 0.5 & 0.5 \\ 0.75 & 1.25 & 0.25 \\ 0.75 & 0.25 & 1.25 \end{pmatrix}B=NR=(0.50.50.5)B = NR = \begin{pmatrix} 0.5 \\ 0.5 \\ 0.5 \end{pmatrix}

Probability of absorption into cell 4 from cell 1: 0.50.5

📝Problem 3: Random Walk

A symmetric random walk on states {0,1,2,3}\{0, 1, 2, 3\} has reflecting boundaries: p0,1=1p_{0,1} = 1 and p3,2=1p_{3,2} = 1. Find the stationary distribution.

💡Solution 3

Transition matrix:

P=(01000.500.5000.500.50010)P = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 \\ 0 & 0.5 & 0 & 0.5 \\ 0 & 0 & 1 & 0 \end{pmatrix}

Solving πP=π\pi P = \pi: By symmetry, π0=π3\pi_0 = \pi_3 and π1=π2\pi_1 = \pi_2. From balance equations: π0=0.5π1\pi_0 = 0.5\pi_1 and π1=π0+0.5π2=π0+0.5π1\pi_1 = \pi_0 + 0.5\pi_2 = \pi_0 + 0.5\pi_1, so π1=2π0\pi_1 = 2\pi_0.

Normalization: π0+2π0+2π0+π0=1    6π0=1\pi_0 + 2\pi_0 + 2\pi_0 + \pi_0 = 1 \implies 6\pi_0 = 1

π=(1/6,1/3,1/3,1/6)\boldsymbol{\pi} = (1/6, 1/3, 1/3, 1/6)

📝Problem 4: PageRank by Hand

Three pages A, B, C with links: A?B, A?C, B?C, C?A. Compute PageRank with d=0.85d = 0.85.

💡Solution 4

Adjacency matrix: L=(011001100)L = \begin{pmatrix} 0 & 1 & 1 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}

Column-stochastic: M=(010001100)M = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{pmatrix}

PageRank equation: r=0.05131+0.85Mr\vec{r} = 0.05 \cdot \frac{1}{3}\vec{1} + 0.85 \cdot M\vec{r}

Solving: r=(1/3,1/3,1/3)\vec{r} = (1/3, 1/3, 1/3) (uniform due to the cycle A?B?C?A)


Quick Reference

📋Markov Chains Quick Reference

  • Markov Property: P(Xn+1Xn,,X0)=P(Xn+1Xn)P(X_{n+1} \mid X_n, \ldots, X_0) = P(X_{n+1} \mid X_n)
  • Transition Matrix: pij=P(Xn+1=jXn=i)p_{ij} = P(X_{n+1} = j \mid X_n = i), rows sum to 1
  • n-Step: PijnP^n_{ij} = probability of iji \to j in nn steps; Pn=PPPP^n = P \cdot P \cdots P
  • Chapman-Kolmogorov: Pm+n=PmPnP^{m+n} = P^m \cdot P^n
  • Stationary Distribution: πP=π\boldsymbol{\pi} P = \boldsymbol{\pi}, πj=1\sum \pi_j = 1
  • Ergodicity: Irreducible + Aperiodic ? unique stationary distribution, convergence from any start
  • Irreducible: All states communicate; Pijn>0P^n_{ij} > 0 for large enough nn
  • Aperiodic: gcd{n:Piin>0}=1\gcd\{n : P^n_{ii} > 0\} = 1
  • Absorbing State: pii=1p_{ii} = 1; once entered, never left
  • Fundamental Matrix: N=(IQ)1N = (I - Q)^{-1} for absorbing chains
  • PageRank: r=(1d)1n+dMr\vec{r} = (1-d)\frac{\vec{1}}{n} + dM\vec{r}, d0.85d \approx 0.85
  • MCMC: Construct chain with target as stationary distribution; sample after burn-in
  • HMMs: Hidden Markov chain + emissions; use forward/backward/Viterbi algorithms
  • Python: np.linalg.eig(P.T) for stationary; np.linalg.matrix_power(P, n) for PnP^n

Cross-References

ℹ️ Related Topics

Lesson Progress43 / 100