← Math|99 of 100
Advanced Topics

Measure Theory

Understand sigma-algebras, measures, Lebesgue integration, and their role in probability.

📂 Measure📖 Lesson 99 of 100🎓 Free Course

Advertisement

Measure Theory

ℹ️ Why It Matters

Measure theory provides the rigorous mathematical foundation for probability theory, enabling continuous distributions, conditional expectations, and convergence theorems that justify interchange of limits and integrals. Without measure theory, we cannot properly define probabilities over uncountable sets (like the real line), and the central limit theorem, law of large numbers, and Bayesian inference lack rigorous foundations. In machine learning, measure theory underlies expectation-maximization algorithms, variational inference, and the theoretical analysis of generalization error. The Lebesgue integral generalizes the Riemann integral to handle highly discontinuous functions and provides the framework for LpL^p spaces.


Core Definitions

DfSigma-Algebra (σ-Algebra)

Let Ω\Omega be a set. A σ-algebra F\mathcal{F} on Ω\Omega is a collection of subsets of Ω\Omega satisfying:

  1. ΩF\Omega \in \mathcal{F}
  2. If AFA \in \mathcal{F}, then Ac=ΩAFA^c = \Omega \setminus A \in \mathcal{F} (closed under complement)
  3. If A1,A2,FA_1, A_2, \ldots \in \mathcal{F}, then n=1AnF\bigcup_{n=1}^\infty A_n \in \mathcal{F} (closed under countable union)

The pair (Ω,F)(\Omega, \mathcal{F}) is called a measurable space. The Borel σ-algebra B(R)\mathcal{B}(\mathbb{R}) is generated by all open intervals in R\mathbb{R}.

DfMeasure

A measure μ\mu on a measurable space (Ω,F)(\Omega, \mathcal{F}) is a function μ:F[0,]\mu: \mathcal{F} \to [0, \infty] satisfying:

  1. μ()=0\mu(\emptyset) = 0 (null empty set)
  2. Countable additivity: For pairwise disjoint sets A1,A2,FA_1, A_2, \ldots \in \mathcal{F}, μ(n=1An)=n=1μ(An)\mu\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \mu(A_n)

The triple (Ω,F,μ)(\Omega, \mathcal{F}, \mu) is called a measure space. Countable additivity is essential—it enables limits and infinite processes to be well-defined.

DfProbability Measure

A probability measure PP is a measure on (Ω,F)(\Omega, \mathcal{F}) with P(Ω)=1P(\Omega) = 1. The triple (Ω,F,P)(\Omega, \mathcal{F}, P) is a probability space. Events are elements of F\mathcal{F}, and P(A)P(A) is the probability of event AA. A random variable X:ΩRX: \Omega \to \mathbb{R} is a measurable function, and its distribution PXP_X is the pushforward measure PX(B)=P(X1(B))P_X(B) = P(X^{-1}(B)).

DfLebesgue Measure

The Lebesgue measure λ\lambda on (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})) is the unique translation-invariant measure on the Borel σ-algebra satisfying λ([a,b])=ba\lambda([a,b]) = b - a. It extends to Rn\mathbb{R}^n with λ([a1,b1]××[an,bn])=i=1n(biai)\lambda([a_1,b_1] \times \cdots \times [a_n,b_n]) = \prod_{i=1}^n (b_i - a_i). The Lebesgue measure generalizes the notion of "length," "area," and "volume" to arbitrary measurable sets.

DfMeasurable Function

A function f:(Ω1,F1)(Ω2,F2)f: (\Omega_1, \mathcal{F}_1) \to (\Omega_2, \mathcal{F}_2) is measurable if f1(B)F1f^{-1}(B) \in \mathcal{F}_1 for every BF2B \in \mathcal{F}_2. For real-valued functions, it suffices that f1((a,))F1f^{-1}((a, \infty)) \in \mathcal{F}_1 for all aRa \in \mathbb{R}. Measurable functions are the "nice" functions that can be integrated.

DfLebesgue Integral

For a non-negative simple function s=i=1nai1Ais = \sum_{i=1}^n a_i \mathbf{1}_{A_i} (where AiA_i are measurable and disjoint), the Lebesgue integral is sdμ=i=1naiμ(Ai)\int s \, d\mu = \sum_{i=1}^n a_i \mu(A_i). For a general non-negative measurable function ff:

fdμ=sup{sdμ:0sf,s simple}\int f \, d\mu = \sup\left\{\int s \, d\mu : 0 \leq s \leq f, \, s \text{ simple}\right\}

For general measurable ff, write f=f+ff = f^+ - f^- where f+=max(f,0)f^+ = \max(f, 0) and f=max(f,0)f^- = \max(-f, 0), and define fdμ=f+dμfdμ\int f \, d\mu = \int f^+ \, d\mu - \int f^- \, d\mu when at least one side is finite.

DfLᵖ Space

For 1p<1 \leq p < \infty, Lp(Ω,F,μ)L^p(\Omega, \mathcal{F}, \mu) is the space of measurable functions ff with fp=(fpdμ)1/p<\|f\|_p = \left(\int |f|^p \, d\mu\right)^{1/p} < \infty. LL^\infty consists of essentially bounded functions with f=ess supf\|f\|_\infty = \text{ess sup} |f|. These are Banach spaces; L2L^2 is additionally a Hilbert space with inner product f,g=fgˉdμ\langle f, g \rangle = \int f\bar{g} \, d\mu.

DfAbsolute Continuity

A measure ν\nu is absolutely continuous with respect to μ\mu (written νμ\nu \ll \mu) if μ(A)=0    ν(A)=0\mu(A) = 0 \implies \nu(A) = 0 for all AFA \in \mathcal{F}. The Radon-Nikodym theorem states that if νμ\nu \ll \mu and both are σ-finite, there exists a measurable function dνdμ\frac{d\nu}{d\mu} (the Radon-Nikodym derivative) such that ν(A)=Adνdμdμ\nu(A) = \int_A \frac{d\nu}{d\mu} \, d\mu for all AFA \in \mathcal{F}.

DfProduct Measure

Given two measure spaces (X,A,μ)(X, \mathcal{A}, \mu) and (Y,B,ν)(Y, \mathcal{B}, \nu), the product measure μ×ν\mu \times \nu on (X×Y,AB)(X \times Y, \mathcal{A} \otimes \mathcal{B}) satisfies (μ×ν)(A×B)=μ(A)ν(B)(\mu \times \nu)(A \times B) = \mu(A)\nu(B) for measurable rectangles. The product σ-algebra AB\mathcal{A} \otimes \mathcal{B} is generated by sets of the form A×BA \times B with AAA \in \mathcal{A} and BBB \in \mathcal{B}.


Key Formulas

Monotone Convergence Theorem (MCT)

0f1f2,fnf    limnfndμ=fdμ0 \leq f_1 \leq f_2 \leq \cdots, \quad f_n \to f \implies \lim_{n \to \infty} \int f_n \, d\mu = \int f \, d\mu

Here,

  • fnf_n=Sequence of non-negative measurable functions, monotonically increasing
  • ff=Pointwise limit of f_n
  • mumu=Measure on the measurable space

Fatou's Lemma

lim infnfndμlim infnfndμ\int \liminf_{n \to \infty} f_n \, d\mu \leq \liminf_{n \to \infty} \int f_n \, d\mu

Here,

  • fnf_n=Sequence of non-negative measurable functions
  • liminfliminf=Limit inferior (greatest lower bound of tail limits)

Dominated Convergence Theorem (DCT)

fnf a.e.,fng with gdμ<    limfndμ=fdμf_n \to f \text{ a.e.}, \quad |f_n| \leq g \text{ with } \int g \, d\mu < \infty \implies \lim \int f_n \, d\mu = \int f \, d\mu

Here,

  • fnf_n=Sequence of measurable functions converging a.e. to f
  • gg=Integrable dominating function: |f_n| ≤ g a.e.

Fubini's Theorem

X×Yf(x,y)d(μ×ν)=X(Yf(x,y)dν(y))dμ(x)=Y(Xf(x,y)dμ(x))dν(y)\int_{X \times Y} f(x,y) \, d(\mu \times \nu) = \int_X \left(\int_Y f(x,y) \, d\nu(y)\right) d\mu(x) = \int_Y \left(\int_X f(x,y) \, d\mu(x)\right) d\nu(y)

Here,

  • ff=Integrable function on the product space X × Y
  • mu,umu, u=Measures on X and Y respectively
  • muimesumu imes u=Product measure on X × Y

Change of Variables Formula

Yf(y)dν=Xf(T(x))dydxdμ\int_Y f(y) \, d\nu = \int_X f(T(x)) \left|\frac{dy}{dx}\right| \, d\mu

Here,

  • TT=Measurable transformation from (X, μ) to (Y, ν)
  • dy/dxdy/dx=Jacobian determinant of T

Radon-Nikodym Theorem

ν(A)=Adνdμdμfor all AF\nu(A) = \int_A \frac{d\nu}{d\mu} \, d\mu \quad \text{for all } A \in \mathcal{F}

Here,

  • u u=σ-finite measure absolutely continuous w.r.t. μ
  • du/dmud u/dmu=Radon-Nikodym derivative (density function)

Markov's Inequality

P(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}

Here,

  • XX=Non-negative random variable
  • aa=Positive constant
  • E[X]E[X]=Expected value (Lebesgue integral of X w.r.t. P)

Chebyshev's Inequality

P(Xμkσ)1k2P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}

Here,

  • XX=Random variable with mean μ and variance σ²
  • kk=Number of standard deviations

Important Theorems

ThLebesgue's Dominated Convergence Theorem

If {fn}\{f_n\} is a sequence of measurable functions converging pointwise a.e. to ff, and there exists an integrable function gg (i.e., gdμ<\int g \, d\mu < \infty) such that fng|f_n| \leq g a.e. for all nn, then ff is integrable and limnfndμ=fdμ\lim_{n \to \infty} \int f_n \, d\mu = \int f \, d\mu.

This theorem justifies interchange of limit and integral under the domination condition, and is the workhorse of measure theory. The domination condition prevents "escape of mass" to infinity or to singularities.

ThFubini's Theorem

Let (X,A,μ)(X, \mathcal{A}, \mu) and (Y,B,ν)(Y, \mathcal{B}, \nu) be σ-finite measure spaces. If ff is integrable on the product space (X×Y,AB,μ×ν)(X \times Y, \mathcal{A} \otimes \mathcal{B}, \mu \times \nu), then:

  1. For a.e. xXx \in X, f(x,)f(x, \cdot) is integrable on YY
  2. The function xYf(x,y)dν(y)x \mapsto \int_Y f(x,y) \, d\nu(y) is integrable on XX
  3. X×Yfd(μ×ν)=X(Yf(x,y)dν(y))dμ(x)=Y(Xf(x,y)dμ(x))dν(y)\int_{X \times Y} f \, d(\mu \times \nu) = \int_X \left(\int_Y f(x,y) \, d\nu(y)\right) d\mu(x) = \int_Y \left(\int_X f(x,y) \, d\mu(x)\right) d\nu(y)

Fubini's theorem allows evaluation of multi-dimensional integrals as iterated one-dimensional integrals, in either order.

ThHahn Decomposition Theorem

Let μ\mu be a signed measure on (Ω,F)(\Omega, \mathcal{F}). There exists a measurable set PP (the positive set) such that μ(A)0\mu(A) \geq 0 for all measurable APA \subseteq P, and N=PcN = P^c (the negative set) such that μ(A)0\mu(A) \leq 0 for all measurable ANA \subseteq N. The decomposition is essentially unique (unique up to μ-null sets).

ThVitali Convergence Theorem

Let {fn}\{f_n\} be a sequence of measurable functions on a finite measure space (Ω,F,μ)(\Omega, \mathcal{F}, \mu) with μ(Ω)<\mu(\Omega) < \infty. Then fnff_n \to f in L1L^1 (i.e., fnfdμ0\int |f_n - f| \, d\mu \to 0) if and only if:

  1. {fn}\{f_n\} is uniformly integrable: limMsupnfn>Mfndμ=0\lim_{M \to \infty} \sup_n \int_{|f_n| > M} |f_n| \, d\mu = 0
  2. {fn}\{f_n\} is tight: for every ϵ>0\epsilon > 0, there exists a set EE with μ(Ec)<ϵ\mu(E^c) < \epsilon such that {fnE}\{f_n|_E\} is uniformly integrable

ThCarathéodory's Extension Theorem

If μ0\mu_0 is a premeasure on an algebra A\mathcal{A} on Ω\Omega, then μ0\mu_0 extends to a measure μ\mu on the σ-algebra σ(A)\sigma(\mathcal{A}) generated by A\mathcal{A}. If μ0\mu_0 is σ-finite, the extension is unique. This theorem justifies the construction of Lebesgue measure from the premeasure on intervals.


Worked Examples

📝Constructing Lebesgue Measure on [0,1]

Problem: Show that the Lebesgue measure of the Cantor set is 0.

Solution:

Step 1: The Cantor set CC is constructed by repeatedly removing middle thirds from [0,1][0,1].

Step 2: At stage nn, we remove 2n12^{n-1} intervals, each of length 3n3^{-n}.

Step 3: Total length removed:

L=n=12n13n=13n=0(23)n=13112/3=133=1L = \sum_{n=1}^{\infty} 2^{n-1} \cdot 3^{-n} = \frac{1}{3} \sum_{n=0}^{\infty} \left(\frac{2}{3}\right)^n = \frac{1}{3} \cdot \frac{1}{1 - 2/3} = \frac{1}{3} \cdot 3 = 1

Step 4: Since [0,1][0,1] has Lebesgue measure 1 and we removed a total of 1, the remaining set CC has measure:

λ(C)=11=0\lambda(C) = 1 - 1 = 0

Result: The Cantor set is uncountable (it has cardinality 202^{\aleph_0}) but has Lebesgue measure 0. This demonstrates that "size" (measure) and cardinality are fundamentally different notions, and motivates the need for σ-algebras to handle such pathological sets.

📝Applying the Monotone Convergence Theorem

Problem: Compute limn0(1+xn)nex/2dx\lim_{n \to \infty} \int_0^\infty \left(1 + \frac{x}{n}\right)^{-n} e^{-x/2} \, dx.

Solution:

Step 1: Identify the pointwise limit. For each fixed x0x \geq 0:

limn(1+xn)n=ex\lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^{-n} = e^{-x}

So fn(x)=(1+xn)nex/2exex/2=e3x/2f_n(x) = \left(1 + \frac{x}{n}\right)^{-n} e^{-x/2} \to e^{-x} \cdot e^{-x/2} = e^{-3x/2}.

Step 2: Verify monotonicity. For fixed xx, (1+xn)n\left(1 + \frac{x}{n}\right)^{-n} is increasing in nn (since (1+t/n)n(1+t/n)^n is increasing). Therefore fn(x)f_n(x) is increasing in nn.

Step 3: Apply the Monotone Convergence Theorem:

limn0fn(x)dx=0limnfn(x)dx=0e3x/2dx\lim_{n \to \infty} \int_0^\infty f_n(x) \, dx = \int_0^\infty \lim_{n \to \infty} f_n(x) \, dx = \int_0^\infty e^{-3x/2} \, dx

Step 4: Compute the integral:

0e3x/2dx=[23e3x/2]0=0+23=23\int_0^\infty e^{-3x/2} \, dx = \left[-\frac{2}{3} e^{-3x/2}\right]_0^\infty = 0 + \frac{2}{3} = \frac{2}{3}

Result: limn0(1+xn)nex/2dx=23\lim_{n \to \infty} \int_0^\infty \left(1 + \frac{x}{n}\right)^{-n} e^{-x/2} \, dx = \frac{2}{3}

The MCT allowed us to swap limit and integral because the sequence was monotonically increasing.

📝Applying Fatou's Lemma

Problem: Let fn=n1(0,1/n]f_n = n \cdot \mathbf{1}_{(0, 1/n]} on ([0,1],B,λ)([0,1], \mathcal{B}, \lambda). Compute lim inffndλ\liminf \int f_n \, d\lambda and lim inffndλ\int \liminf f_n \, d\lambda.

Solution:

Step 1: Compute fndλ=nλ((0,1/n])=n1n=1\int f_n \, d\lambda = n \cdot \lambda((0, 1/n]) = n \cdot \frac{1}{n} = 1 for all nn.

So lim infnfndλ=1\liminf_{n \to \infty} \int f_n \, d\lambda = 1.

Step 2: Find the pointwise limit. For any x>0x > 0, there exists NN such that x>1/Nx > 1/N, so fn(x)=0f_n(x) = 0 for all nNn \geq N. Thus limnfn(x)=0\lim_{n \to \infty} f_n(x) = 0 for all x>0x > 0.

Also fn(0)=0f_n(0) = 0 for all nn. So lim inffn=0\liminf f_n = 0 pointwise.

Step 3: Compute lim inffndλ=0dλ=0\int \liminf f_n \, d\lambda = \int 0 \, d\lambda = 0.

Step 4: Fatou's Lemma gives:

lim inffndλ=01=lim inffndλ\int \liminf f_n \, d\lambda = 0 \leq 1 = \liminf \int f_n \, d\lambda

Result: Fatou's inequality is strict: 0<10 < 1. The functions fnf_n escape to infinity near 0 (they "spike" higher but over smaller intervals), so the integral of the limit (0) is strictly less than the limit of the integrals (1). This is why Fatou gives an inequality, not an equality.

📝Applying the Dominated Convergence Theorem

Problem: Compute limn01nxn11+xndx\lim_{n \to \infty} \int_0^1 \frac{n x^{n-1}}{1 + x^n} \, dx.

Solution:

Step 1: Let fn(x)=nxn11+xnf_n(x) = \frac{nx^{n-1}}{1+x^n}. Find the pointwise limit.

For x[0,1)x \in [0, 1): xn0x^n \to 0, so fn(x)01+0=0f_n(x) \to \frac{0}{1+0} = 0.

For x=1x = 1: fn(1)=n2f_n(1) = \frac{n}{2} \to \infty.

Since {1}\{1\} has measure 0, fn0f_n \to 0 a.e.

Step 2: Find a dominating function. Note that:

fn(x)=ddxln(1+xn)f_n(x) = \frac{d}{dx} \ln(1 + x^n)

By the mean value theorem, for x[0,1)x \in [0,1):

fn(x)=nxn11+xnn1=nf_n(x) = \frac{nx^{n-1}}{1+x^n} \leq \frac{n}{1} = n

But this bound grows with nn. Instead, note:

fn(x)n11=nfor x[0,1]f_n(x) \leq \frac{n \cdot 1}{1} = n \quad \text{for } x \in [0,1]

This doesn't work. Let's try a different approach.

Actually, integrate directly:

01fn(x)dx=01nxn11+xndx=[ln(1+xn)]01=ln2ln1=ln2\int_0^1 f_n(x) \, dx = \int_0^1 \frac{nx^{n-1}}{1+x^n} \, dx = [\ln(1+x^n)]_0^1 = \ln 2 - \ln 1 = \ln 2

Step 3: The integral is ln2\ln 2 for all nn, so the limit is ln2\ln 2.

Step 4: Apply DCT: fn0f_n \to 0 a.e., but the integrals converge to ln20\ln 2 \neq 0. The DCT does NOT apply because we cannot find an integrable dominating function (the "mass" escapes to the point x=1x=1, which has measure 0 but the function blows up there).

Result: limn01fndx=ln2\lim_{n \to \infty} \int_0^1 f_n \, dx = \ln 2, despite fn0f_n \to 0 a.e. This demonstrates that pointwise convergence alone does not guarantee convergence of integrals—we need the domination condition.


Practice Problems

📝Problem 1: Show That the Rationals Have Lebesgue Measure Zero

Problem: Prove that λ(Q[0,1])=0\lambda(\mathbb{Q} \cap [0,1]) = 0.

Solution:

Step 1: Q[0,1]\mathbb{Q} \cap [0,1] is countable. Enumerate it as {q1,q2,q3,}\{q_1, q_2, q_3, \ldots\}.

Step 2: For any ϵ>0\epsilon > 0, cover each qnq_n with an interval of length ϵ/2n\epsilon/2^n:

U=n=1(qnϵ2n+1,qn+ϵ2n+1)U = \bigcup_{n=1}^{\infty} \left(q_n - \frac{\epsilon}{2^{n+1}}, q_n + \frac{\epsilon}{2^{n+1}}\right)

Step 3: By countable subadditivity of Lebesgue measure:

λ(U)n=1ϵ2n=ϵ\lambda(U) \leq \sum_{n=1}^{\infty} \frac{\epsilon}{2^n} = \epsilon

Step 4: Since Q[0,1]U\mathbb{Q} \cap [0,1] \subseteq U and λ(U)ϵ\lambda(U) \leq \epsilon for all ϵ>0\epsilon > 0:

λ(Q[0,1])=0\lambda(\mathbb{Q} \cap [0,1]) = 0

Result: The rationals in [0,1][0,1] have Lebesgue measure zero. This means Lebesgue integration "ignores" the rationals—they form a set of measure zero, so modifying a function on the rationals does not change its Lebesgue integral.

📝Problem 2: Verify Countable Additivity

Problem: Let μ\mu be the counting measure on (N,2N)(\mathbb{N}, 2^{\mathbb{N}}). Verify that μ\mu is a measure.

Solution:

Step 1: Check μ()=0\mu(\emptyset) = 0. The counting measure of the empty set is 0 (no elements to count). ✓

Step 2: Check countable additivity. Let A1,A2,A_1, A_2, \ldots be pairwise disjoint subsets of N\mathbb{N}.

μ(n=1An)=n=1An\mu\left(\bigcup_{n=1}^{\infty} A_n\right) = \left|\bigcup_{n=1}^{\infty} A_n\right|

Since the AnA_n are disjoint, each element of An\bigcup A_n belongs to exactly one AnA_n:

n=1An=n=1An=n=1μ(An)\left|\bigcup_{n=1}^{\infty} A_n\right| = \sum_{n=1}^{\infty} |A_n| = \sum_{n=1}^{\infty} \mu(A_n)

Step 3: Both sides are either finite and equal, or both are infinite. ✓

Result: The counting measure is a valid measure. Note that it is σ-finite (each {n}\{n\} has measure 1), and the counting measure is used to define p\ell^p spaces.

📝Problem 3: Lebesgue Integral vs. Riemann Integral

Problem: Compute 011Q(x)dλ\int_0^1 \mathbf{1}_{\mathbb{Q}}(x) \, d\lambda (Lebesgue integral of the Dirichlet function).

Solution:

Step 1: The Dirichlet function is f(x)=1Q(x)={1xQ0xQf(x) = \mathbf{1}_{\mathbb{Q}}(x) = \begin{cases} 1 & x \in \mathbb{Q} \\ 0 & x \notin \mathbb{Q} \end{cases}

Step 2: We showed that λ(Q[0,1])=0\lambda(\mathbb{Q} \cap [0,1]) = 0.

Step 3: Since f=0f = 0 almost everywhere (a.e.) with respect to Lebesgue measure:

011Qdλ=010dλ=0\int_0^1 \mathbf{1}_{\mathbb{Q}} \, d\lambda = \int_0^1 0 \, d\lambda = 0

Step 4: Contrast with Riemann integral: The Riemann integral of 1Q\mathbf{1}_{\mathbb{Q}} does not exist because in every subinterval of [0,1][0,1], the function takes both values 0 and 1, so upper and lower Riemann sums never converge.

Result: The Lebesgue integral equals 0, while the Riemann integral does not exist. This demonstrates the power of Lebesgue integration: it can integrate highly discontinuous functions that Riemann integration cannot handle.

📝Problem 4: Applying Fubini's Theorem

Problem: Compute 0101xy(x+y)3dxdy\int_0^1 \int_0^1 \frac{x - y}{(x + y)^3} \, dx \, dy and 0101xy(x+y)3dydx\int_0^1 \int_0^1 \frac{x - y}{(x + y)^3} \, dy \, dx. Do they give the same answer?

Solution:

Step 1: Note that f(x,y)=xy(x+y)3f(x,y) = \frac{x-y}{(x+y)^3} is NOT non-negative, so we need to check integrability.

Step 2: Compute the iterated integral 01(01xy(x+y)3dx)dy\int_0^1 \left(\int_0^1 \frac{x-y}{(x+y)^3} \, dx\right) dy.

Inner integral: 01xy(x+y)3dx\int_0^1 \frac{x-y}{(x+y)^3} \, dx. Let u=x+yu = x + y, du=dxdu = dx:

y1+yu2yu3du=y1+y(u22yu3)du=[u1+yu2]y1+y\int_y^{1+y} \frac{u - 2y}{u^3} \, du = \int_y^{1+y} (u^{-2} - 2y u^{-3}) \, du = \left[-u^{-1} + yu^{-2}\right]_y^{1+y}
=(11+y+y(1+y)2)(1y+yy2)=y+y(1+y)21+1y=0= \left(-\frac{1}{1+y} + \frac{y}{(1+y)^2}\right) - \left(-\frac{1}{y} + \frac{y}{y^2}\right) = \frac{-y + y}{(1+y)^2} - \frac{-1 + 1}{y} = 0

Wait, let me redo this more carefully.

[1u+yu2]y1+y=(11+y+y(1+y)2)(1y+1y)\left[-\frac{1}{u} + \frac{y}{u^2}\right]_y^{1+y} = \left(-\frac{1}{1+y} + \frac{y}{(1+y)^2}\right) - \left(-\frac{1}{y} + \frac{1}{y}\right)
=(1+y)+y(1+y)20=1(1+y)2= \frac{-(1+y) + y}{(1+y)^2} - 0 = \frac{-1}{(1+y)^2}

Step 3: Outer integral: 011(1+y)2dy=[11+y]01=121=12\int_0^1 \frac{-1}{(1+y)^2} \, dy = \left[\frac{1}{1+y}\right]_0^1 = \frac{1}{2} - 1 = -\frac{1}{2}

Step 4: By symmetry (f(x,y)=f(y,x)f(x,y) = -f(y,x)):

0101xy(x+y)3dydx=12\int_0^1 \int_0^1 \frac{x-y}{(x+y)^3} \, dy \, dx = \frac{1}{2}

Result: The two iterated integrals give different values (1/2-1/2 vs 1/21/2). Fubini's theorem does NOT apply because f(x,y)dxdy=\int \int |f(x,y)| \, dx \, dy = \infty (the function is not integrable on the product space). This demonstrates the importance of checking the integrability condition in Fubini's theorem.

📝Problem 5: Radon-Nikodym Derivative

Problem: Let PP be the uniform distribution on [0,1][0,1] (Lebesgue measure) and QQ be the distribution with density q(x)=2xq(x) = 2x. Find the Radon-Nikodym derivative dQdP\frac{dQ}{dP}.

Solution:

Step 1: Check absolute continuity. If P(A)=λ(A)=0P(A) = \lambda(A) = 0, then Q(A)=A2xdx=0Q(A) = \int_A 2x \, dx = 0. So QPQ \ll P.

Step 2: By the Radon-Nikodym theorem, there exists dQdP\frac{dQ}{dP} such that Q(A)=AdQdPdPQ(A) = \int_A \frac{dQ}{dP} \, dP.

Step 3: Since PP is Lebesgue measure and QQ has density 2x2x with respect to Lebesgue measure:

Q(A)=A2xdx=A2xdPQ(A) = \int_A 2x \, dx = \int_A 2x \, dP

Step 4: Therefore dQdP(x)=2x\frac{dQ}{dP}(x) = 2x.

Verification: 012xdx=[x2]01=1=Q([0,1])\int_0^1 2x \, dx = [x^2]_0^1 = 1 = Q([0,1])

Result: The Radon-Nikodym derivative dQdP=2x\frac{dQ}{dP} = 2x is the likelihood ratio. In statistics, this is the "score function" or "likelihood ratio" used in hypothesis testing and maximum likelihood estimation. The Kullback-Leibler divergence can be expressed as DKL(QP)=log(dQdP)dQD_{KL}(Q \| P) = \int \log\left(\frac{dQ}{dP}\right) dQ.

📝Problem 6: Borel-Cantelli Lemma

Problem: Let AnA_n be events in a probability space with n=1P(An)<\sum_{n=1}^{\infty} P(A_n) < \infty. Show that P(lim supAn)=0P(\limsup A_n) = 0.

Solution:

Step 1: Define B=lim supAn=n=1k=nAkB = \limsup A_n = \bigcap_{n=1}^{\infty} \bigcup_{k=n}^{\infty} A_k (infinitely many AnA_n occur).

Step 2: For each NN, k=NAk\bigcup_{k=N}^{\infty} A_k is an event containing BB.

Step 3: By countable subadditivity:

P(k=NAk)k=NP(Ak)P\left(\bigcup_{k=N}^{\infty} A_k\right) \leq \sum_{k=N}^{\infty} P(A_k)

Step 4: Since P(An)<\sum P(A_n) < \infty, the tail k=NP(Ak)0\sum_{k=N}^{\infty} P(A_k) \to 0 as NN \to \infty.

Step 5: Therefore P(B)P(k=NAk)k=NP(Ak)0P(B) \leq P\left(\bigcup_{k=N}^{\infty} A_k\right) \leq \sum_{k=N}^{\infty} P(A_k) \to 0.

Result: P(lim supAn)=0P(\limsup A_n) = 0. This means "almost surely, only finitely many AnA_n occur." The Borel-Cantelli lemma is fundamental in probability theory and connects measure theory to the study of random events.

📝Problem 7: Egorov's Theorem

Problem: State Egorov's theorem and explain its significance.

Solution:

Statement: Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a finite measure space (μ(Ω)<\mu(\Omega) < \infty), and let fnff_n \to f pointwise a.e. Then for every ϵ>0\epsilon > 0, there exists a measurable set EΩE \subseteq \Omega with μ(Ec)<ϵ\mu(E^c) < \epsilon such that fnff_n \to f uniformly on EE.

Significance: Egorov's theorem bridges pointwise convergence and uniform convergence. It says that on a finite measure space, pointwise convergence is "almost uniform"—we can find a set of arbitrarily small complement on which convergence is uniform.

Example: For fn=n1(0,1/n]f_n = n \cdot \mathbf{1}_{(0,1/n]} on [0,1][0,1], fn0f_n \to 0 pointwise but not uniformly on [0,1][0,1]. However, for any ϵ>0\epsilon > 0, take E=[ϵ,1]E = [\epsilon, 1]. Then μ(Ec)=ϵ\mu(E^c) = \epsilon and fn=0f_n = 0 on EE for n>1/ϵn > 1/\epsilon, so fn0f_n \to 0 uniformly on EE.

Result: Egorov's theorem is important because it shows that pointwise convergence on finite measure spaces is "close to" uniform convergence, which is stronger and more useful for analysis.

📝Problem 8: Lebesgue Differentiation Theorem

Problem: State the Lebesgue differentiation theorem and explain its significance.

Solution:

Statement: For fLloc1(R)f \in L^1_{\text{loc}}(\mathbb{R}) (locally integrable), for almost every xRx \in \mathbb{R}:

limh01hxx+hf(t)dt=f(x)\lim_{h \to 0} \frac{1}{h} \int_x^{x+h} f(t) \, dt = f(x)

Equivalently: limr01B(x,r)B(x,r)f(y)dy=f(x)\lim_{r \to 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y) \, dy = f(x) a.e.

Significance: This theorem says that the "average value" of ff over small balls converges to f(x)f(x) for a.e. xx. It justifies the informal idea that integrals "recover" the integrand.

Example: For f=1[0,1]f = \mathbf{1}_{[0,1]}, at x=0.5x = 0.5:

12h0.5h0.5+h1[0,1](t)dt=2h2h=1=f(0.5)\frac{1}{2h}\int_{0.5-h}^{0.5+h} \mathbf{1}_{[0,1]}(t)dt = \frac{2h}{2h} = 1 = f(0.5)

At x=0x = 0 (boundary): 12hhh1[0,1](t)dt=h2h=1/2f(0)=1\frac{1}{2h}\int_{-h}^{h} \mathbf{1}_{[0,1]}(t)dt = \frac{h}{2h} = 1/2 \neq f(0) = 1

But {0}\{0\} has measure zero, so the theorem holds a.e.

Result: The Lebesgue differentiation theorem is fundamental in real analysis—it connects the integral and the function value, and is used in the study of Hardy-Littlewood maximal functions, singular integrals, and the theory of differentiation.

📝Problem 9: Jensen's Inequality

Problem: State Jensen's inequality and apply it to show that E[logX]logE[X]E[\log X] \leq \log E[X] for positive random variable XX.

Solution:

Statement: If ϕ\phi is a convex function and XX is a random variable, then ϕ(E[X])E[ϕ(X)]\phi(E[X]) \leq E[\phi(X)].

Step 1: Let ϕ(x)=logx\phi(x) = -\log x (which is convex for x>0x > 0).

Step 2: Apply Jensen's inequality:

ϕ(E[X])E[ϕ(X)]\phi(E[X]) \leq E[\phi(X)]
logE[X]E[logX]-\log E[X] \leq E[-\log X]
logE[X]E[logX]\log E[X] \geq E[\log X]

Step 3: Therefore E[logX]logE[X]E[\log X] \leq \log E[X].

Step 4: This is the AM-GM inequality in expectation: the geometric mean is always less than or equal to the arithmetic mean.

Example: If XX takes values 1 and 3 with equal probability:

  • E[X]=2E[X] = 2, logE[X]=log20.693\log E[X] = \log 2 \approx 0.693
  • E[logX]=12(log1+log3)=12log30.549E[\log X] = \frac{1}{2}(\log 1 + \log 3) = \frac{1}{2}\log 3 \approx 0.549
  • 0.5490.6930.549 \leq 0.693

Result: Jensen's inequality is fundamental in information theory (KL divergence is non-negative) and in machine learning (variational inference bounds).

📝Problem 10: Completion of a Measure Space

Problem: Explain the completion of a measure space and why it matters.

Solution:

Definition: Given a measure space (Ω,F,μ)(\Omega, \mathcal{F}, \mu), the completion (Ω,Fˉ,μˉ)(\Omega, \bar{\mathcal{F}}, \bar{\mu}) is defined by:

Fˉ={AB:AF,BN for some NF with μ(N)=0}\bar{\mathcal{F}} = \{A \cup B : A \in \mathcal{F}, B \subseteq N \text{ for some } N \in \mathcal{F} \text{ with } \mu(N) = 0\}

and μˉ(AB)=μ(A)\bar{\mu}(A \cup B) = \mu(A) for BNB \subseteq N with μ(N)=0\mu(N) = 0.

Step 1: The completion adds all subsets of μ\mu-null sets to F\mathcal{F}. This ensures that subsets of measure-zero sets are measurable.

Step 2: The Lebesgue measure is the completion of the Borel measure on (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})). The Borel σ-algebra is generated by open sets, but there are Borel sets that are not Lebesgue measurable. The Lebesgue σ-algebra includes all Borel sets plus all subsets of Borel null sets.

Step 3: Why it matters:

  • Without completion, subsets of null sets might not be measurable
  • The Lebesgue differentiation theorem requires completion to state "almost everywhere" precisely
  • In probability, completing the probability space ensures that events that "almost surely" don't happen have probability 0

Step 4: Example: The Cantor set has Lebesgue measure 0. Any subset of the Cantor set has Lebesgue measure 0 in the completed measure space, even though it may not be a Borel set.

Result: Completion ensures that measure theory is "maximally inclusive"—all subsets of negligible sets are automatically measurable. This is important for rigorous statements about "almost everywhere" convergence and for working with completions of probability spaces.

📝Problem 11: Product Measure Construction

Problem: Construct the product measure on [0,1]2[0,1]^2 from Lebesgue measure on [0,1][0,1].

Solution:

Step 1: Start with the product σ-algebra B([0,1])B([0,1])=B([0,1]2)\mathcal{B}([0,1]) \otimes \mathcal{B}([0,1]) = \mathcal{B}([0,1]^2).

Step 2: Define the product measure μ×μ\mu \times \mu on rectangles: (μ×μ)(A×B)=μ(A)μ(B)(\mu \times \mu)(A \times B) = \mu(A) \cdot \mu(B) for A,BB([0,1])A, B \in \mathcal{B}([0,1]).

Step 3: By Carathéodory's extension theorem, this extends uniquely to a measure on B([0,1]2)\mathcal{B}([0,1]^2).

Step 4: The product measure on [0,1]2[0,1]^2 is just the 2-dimensional Lebesgue measure λ2\lambda_2.

Step 5: Verification via Fubini: For any AB([0,1]2)A \in \mathcal{B}([0,1]^2):

λ2(A)=[0,1]λ({y:(x,y)A})dx=[0,1]λ(Ax)dx\lambda_2(A) = \int_{[0,1]} \lambda(\{y : (x,y) \in A\}) \, dx = \int_{[0,1]} \lambda(A_x) \, dx

where Ax={y:(x,y)A}A_x = \{y : (x,y) \in A\} is the xx-section of AA.

Result: The product of 1-dimensional Lebesgue measure with itself gives 2-dimensional Lebesgue measure. This generalizes: λn=λ1××λ1\lambda_n = \lambda_1 \times \cdots \times \lambda_1 (nn times) gives nn-dimensional Lebesgue measure.

📝Problem 12: Tonelli's Theorem

Problem: State Tonelli's theorem and explain how it differs from Fubini's theorem.

Solution:

Statement: Let (X,A,μ)(X, \mathcal{A}, \mu) and (Y,B,ν)(Y, \mathcal{B}, \nu) be σ-finite measure spaces. If f:X×Y[0,]f: X \times Y \to [0, \infty] is measurable (non-negative), then:

  1. The function xYf(x,y)dν(y)x \mapsto \int_Y f(x,y) \, d\nu(y) is measurable on XX
  2. X×Yfd(μ×ν)=X(Yf(x,y)dν(y))dμ(x)=Y(Xf(x,y)dμ(x))dν(y)\int_{X \times Y} f \, d(\mu \times \nu) = \int_X \left(\int_Y f(x,y) \, d\nu(y)\right) d\mu(x) = \int_Y \left(\int_X f(x,y) \, d\mu(x)\right) d\nu(y)

Key difference from Fubini: Tonelli requires f0f \geq 0 (non-negative) but does NOT require integrability. Fubini requires integrability but allows signed functions. For non-negative functions, you can always swap the order of integration (Tonelli). For signed functions, you must check integrability first (Fubini).

Example: For f(x,y)=exysin(x)f(x,y) = e^{-xy} \sin(x) on [0,)×[0,)[0,\infty) \times [0,\infty):

  • Tonelli applies to f|f| to check integrability
  • If f<\int \int |f| < \infty, then Fubini applies and we can swap

Result: Tonelli's theorem is often used as a stepping stone: first apply Tonelli to f|f| to verify integrability, then apply Fubini to ff to swap integration order.


Common Mistakes

MistakeCorrect Approach
Assuming pointwise convergence implies convergence of integralsNeed dominated convergence (DCT) or monotone convergence (MCT)
Confusing "almost everywhere" with "everywhere"Sets of measure zero can be ignored in Lebesgue integration
Forgetting Fubini's theorem requires integrabilityCheck fdμdν<\int \int |f| \, d\mu \, d\nu < \infty before swapping integration order
Assuming the Riemann and Lebesgue integrals always agreeThey agree for Riemann-integrable functions, but Lebesgue handles more cases
Confusing σ-algebra with topologyσ-algebras are closed under complement and countable union; topologies are closed under arbitrary union and finite intersection
Assuming every subset is measurableThere exist non-measurable sets (Vitali sets) requiring the Axiom of Choice
Forgetting that measures are only countably additiveFinite additivity is weaker; countable additivity enables limits

Connections to Machine Learning

ℹ️ Connections to Machine Learning

Measure theory is the mathematical foundation of probability and statistics, which underpin all of ML: (1) Probability distributions are measures on sample spaces; continuous densities exist by the Radon-Nikodym theorem. (2) Expectation is the Lebesgue integral E[X]=XdPE[X] = \int X \, dP, enabling rigorous treatment of expectations over infinite spaces. (3) Convergence theorems justify interchange of limits in EM algorithms, variational inference, and stochastic gradient descent analysis. (4) Hypothesis testing uses likelihood ratios dQdP\frac{dQ}{dP} (Radon-Nikodym derivatives). (5) Bayesian inference updates prior measures to posterior measures via Bayes' rule, which is a statement about conditional Radon-Nikodym derivatives. (6) Generalization bounds use concentration inequalities (Markov, Chebyshev, Hoeffding) that are consequences of measure-theoretic integration.


Exam/Interview Questions

Q1: State the three convergence theorems and explain when each applies.

Answer: (1) Monotone Convergence Theorem: If 0f1f20 \leq f_1 \leq f_2 \leq \cdots with fnff_n \to f, then fnf\int f_n \to \int f. Requires monotone increase. (2) Fatou's Lemma: For non-negative fnf_n, lim inffnlim inffn\int \liminf f_n \leq \liminf \int f_n. Always holds but gives inequality. (3) Dominated Convergence Theorem: If fnff_n \to f a.e. and fng|f_n| \leq g with gg integrable, then fnf\int f_n \to \int f. Requires an integrable bound. MCT is used for increasing sequences, DCT for general convergence with domination, and Fatou for lower bounds.


Q2: Why is the Lebesgue integral preferred over the Riemann integral in probability theory?

Answer: The Lebesgue integral handles: (1) highly discontinuous functions (like indicator functions of rationals), (2) infinite-dimensional spaces (function spaces, sequence spaces), (3) general measures (not just length on R\mathbb{R}), and (4) convergence theorems that justify limit-interchange operations essential in probability (law of large numbers, central limit theorem). Riemann integration cannot handle the Dirichlet function, and cannot be extended to general measure spaces.


Q3: What is a σ-algebra and why do we need it?

Answer: A σ-algebra F\mathcal{F} on Ω\Omega is a collection of subsets closed under complement and countable union. We need it because: (1) Not all subsets of R\mathbb{R} are Lebesgue measurable (Vitali sets). (2) The complement axiom ensures P(Ac)=1P(A)P(A^c) = 1 - P(A). (3) Countable union closure enables countable additivity: P(An)=P(An)P(\bigcup A_n) = \sum P(A_n) for disjoint AnA_n, which is essential for taking limits of events. The Borel σ-algebra B(R)\mathcal{B}(\mathbb{R}) is generated by open sets.


Q4: Give an example where the Dominated Convergence Theorem does not apply and the conclusion fails.

Answer: Let fn=n1(0,1/n]f_n = n \cdot \mathbf{1}_{(0, 1/n]} on ([0,1],λ)([0,1], \lambda). Then fn0f_n \to 0 a.e., but fn=110=0\int f_n = 1 \to 1 \neq 0 = \int 0. The DCT fails because there is no integrable dominating function: fnn1(0,1/n]|f_n| \leq n \cdot \mathbf{1}_{(0,1/n]}, and any dominating gg would need gng \geq n on (0,1/n](0, 1/n] for all nn, making g=\int g = \infty. This illustrates the necessity of the domination condition.


Q5: Explain the Radon-Nikodym theorem and its interpretation as a likelihood ratio.

Answer: If QPQ \ll P (Q is absolutely continuous w.r.t. P), then dQdP\frac{dQ}{dP} exists such that Q(A)=AdQdPdPQ(A) = \int_A \frac{dQ}{dP} dP. In probability: if PP and QQ are probability measures on the same space with densities pp and qq, then dQdP=qp\frac{dQ}{dP} = \frac{q}{p}, which is the likelihood ratio. This is fundamental in: hypothesis testing (Neyman-Pearson lemma), importance sampling (reweighting samples), and KL divergence DKL(QP)=EQ[logdQdP]D_{KL}(Q\|P) = E_Q[\log\frac{dQ}{dP}].


Quick Reference

ConceptFormulaKey Insight
σ-AlgebraΩF\Omega \in \mathcal{F}, closed under complement and countable unionDomain of measurable sets
Measureμ()=0\mu(\emptyset) = 0, countably additiveAssigns "size" to sets
Probability MeasureP(Ω)=1P(\Omega) = 1Normalized measure
Lebesgue Integralfdμ=sup{sdμ:0sf,s simple}\int f \, d\mu = \sup\{\int s \, d\mu : 0 \leq s \leq f, s \text{ simple}\}Integrates via range slicing
MCTfnf    fnff_n \uparrow f \implies \int f_n \to \int fSwap limit and integral for increasing sequences
Fatou's Lemmalim inffnlim inffn\int \liminf f_n \leq \liminf \int f_nLower bound on limit of integrals
DCTfngL1,fnf    fnf\|f_n\| \leq g \in L^1, f_n \to f \implies \int f_n \to \int fSwap limit with domination
Fubini's TheoremX×Yf=XYf=YXf\int_{X \times Y} f = \int_X \int_Y f = \int_Y \int_X fSwap order of integration
Radon-Nikodymdνdμ\frac{d\nu}{d\mu} with ν(A)=Adνdμdμ\nu(A) = \int_A \frac{d\nu}{d\mu} d\muDensity of one measure w.r.t. another
Markov's InequalityP(Xa)E[X]/aP(X \geq a) \leq E[X]/aBound tail probabilities
Chebyshev's InequalityP(Xμkσ)1/k2P(|X - \mu| \geq k\sigma) \leq 1/k^2Bound deviations from mean

Cross-References

Lesson Progress99 / 100