Understand Banach spaces, Hilbert spaces, and their applications in kernel methods.
π Function Spacesπ Lesson 98 of 100π Free Course
Advertisement
Functional Analysis
βΉοΈ Why It Matters
Functional analysis is the study of infinite-dimensional vector spaces equipped with additional structure (norms, inner products, topologies). It provides the mathematical foundation for kernel methods in machine learning, where data is mapped to high-dimensional (possibly infinite-dimensional) feature spaces. The Reproducing Kernel Hilbert Space (RKHS) framework unifies SVMs, Gaussian processes, and many regularization schemes. Understanding functional analysis reveals why kernel tricks work, how operators act on function spaces, and what guarantees exist for convergence of iterative methods in infinite dimensions.
Core Definitions
DfNormed Vector Space
A normed vector space is a pair (V,β₯β β₯) where V is a vector space over R or C and β₯β β₯:Vβ[0,β) is a norm satisfying:
β₯xβ₯=0βΊx=0 (definiteness)
β₯Ξ±xβ₯=β£Ξ±β£β₯xβ₯ for scalar Ξ± (absolute homogeneity)
β₯x+yβ₯β€β₯xβ₯+β₯yβ₯ (triangle inequality)
Examples include βp spaces with the p-norm, C[a,b] with the supremum norm, and Lp spaces with the Lebesgue p-norm.
DfBanach Space
A Banach space is a normed vector space (V,β₯β β₯) that is complete: every Cauchy sequence {xnβ} in V converges to a limit xβV. Completeness ensures that infinite processes (limits, series, iterative algorithms) produce well-defined results within the space. The spaces βp (1β€pβ€β) and Lp (1β€pβ€β) are Banach spaces.
A linear operator T:VβW between vector spaces satisfies T(Ξ±x+Ξ²y)=Ξ±T(x)+Ξ²T(y). When V and W are normed spaces, T is bounded if β₯Tβ₯=supβ₯xβ₯=1ββ₯Txβ₯<β. Bounded linear operators are continuous. The set of bounded linear operators from X to Y is denoted B(X,Y).
The spectrum Ο(T) of a bounded linear operator T is the set of Ξ»βC such that TβΞ»I is not invertible. For self-adjoint operators on a Hilbert space, the spectrum is real and consists of three parts: point spectrum (eigenvalues), continuous spectrum, and residual spectrum. The spectral theorem describes the structure of self-adjoint operators via their spectra.
A linear operator T:XβY between Banach spaces is compact if it maps bounded sets to precompact sets (sets whose closure is compact). Equivalently, for every bounded sequence {xnβ}, {Txnβ} has a convergent subsequence. Compact operators are "almost finite-rank" and have well-behaved spectra (eigenvalues accumulate only at 0). The identity operator on an infinite-dimensional space is bounded but NOT compact.
If T:XβY is a surjective bounded linear operator between Banach spaces X and Y, then T is an open map (it maps open sets to open sets). This implies that if T is bijective, its inverse Tβ1 is also bounded. The open mapping theorem is one of the three fundamental theorems of functional analysis (along with Hahn-Banach and uniform boundedness).
ThHahn-Banach Theorem
Let X be a normed space, MβX a subspace, and f:MβR a bounded linear functional. Then f extends to a bounded linear functional f~β:XβR with β₯f~ββ₯=β₯fβ₯. This guarantees the existence of "enough" continuous linear functionals to separate points. It is used to prove the existence of support functionals, dual bases, and the Riesz representation theorem.
Let H be a RKHS, K its reproducing kernel, and S={(xiβ,yiβ)}i=1nβ a training set. For any loss function L and regularization parameter Ξ»>0, any minimizer of n1ββi=1nβL(yiβ,f(xiβ))+Ξ»β₯fβ₯H2β has the form fβ(β )=βi=1nβΞ±iβK(xiβ,β ). This reduces infinite-dimensional optimization to a finite-dimensional problem in the coefficients Ξ±iβ.
Let X be a Banach space, Y a normed space, and {TΞ±β}Ξ±βAβ a family of bounded linear operators from X to Y. If supΞ±ββ₯TΞ±βxβ₯<β for each xβX (pointwise bounded), then supΞ±ββ₯TΞ±ββ₯<β (uniformly bounded). This theorem is essential for proving convergence of Fourier series and for the spectral theory of unbounded operators.
Worked Examples
πVerifying Cauchy-Schwarz Inequality
Problem: Verify the Cauchy-Schwarz inequality for x=(1,2,3) and y=(4,5,6) in R3.
Equality condition: Cauchy-Schwarz holds with equality iff x and y are linearly dependent. Here 32<32.833 (strict inequality), so x and y are linearly independent.
Result: The polynomial kernel of degree 2 with c=1 computes an inner product in a 6-dimensional feature space without explicitly computing Ο(x) or Ο(y).
For general d and input dimension n, the explicit feature space has dimension (dn+dβ), which grows exponentially. The kernel trick avoids this exponential cost.
πSpectral Decomposition of a Compact Operator
Problem: Find the spectral decomposition of the integral operator T:L2[0,Ο]βL2[0,Ο] defined by Tf(x)=β«0Οβsin(x)sin(y)f(y)dy.
Solution:
Step 1: Identify the operator. T is a rank-1 operator since the kernel K(x,y)=sin(x)sin(y) is separable.
Step 4: Since 2Οnββ as nββ, we have β₯Tβ₯=β.
Result: The differentiation operator is unbounded on C[0,1] with the supremum norm. This demonstrates that differentiation is a fundamentally different operation from integration (which is bounded). In quantum mechanics, the momentum operator p^β=βiβdxdβ is an unbounded self-adjoint operator.
Problem: Given data {(x1β,y1β),(x2β,y2β),(x3β,y3β)}={(0,1),(1,3),(2,5)}, find the function fβ that minimizes βi=13β(yiββf(xiβ))2+Ξ»β₯fβ₯H2β using the RBF kernel K(x,y)=eβΞ³(xβy)2 with Ξ³=1 and Ξ»=0.1.
Solution:
Step 1: By the Representer Theorem, fβ(β )=βi=13βΞ±iβK(xiβ,β ).
Step 2: Define the kernel matrix Kijβ=K(xiβ,xjβ)=eβ(xiββxjβ)2:
Step 4: Solve the 3Γ3 system (e.g., via Gaussian elimination):
Ξ±β(0.24,1.93,3.63)
Step 5: The solution is fβ(x)=0.24β eβx2+1.93β eβ(xβ1)2+3.63β eβ(xβ2)2.
Result: The function interpolates the data while remaining smooth (controlled by Ξ»). The representer theorem reduced an infinite-dimensional problem to solving a 3Γ3 linear system.
πProblem 4: Orthonormal Basis in LΒ²[-Ο, Ο]
Problem: Verify that the Fourier basis {einx/2Οβ}nβZβ is orthonormal in L2[βΟ,Ο].
Result: The Fourier basis is orthonormal. By completeness, any fβL2[βΟ,Ο] can be written as f(x)=βnβZβf^β(n)einx with f^β(n)=2Ο1ββ«βΟΟβf(x)eβinxdx.
πProblem 5: Bounding Generalization Error via RKHS Norm
Problem: For a function fβ in an RKHS H with β₯fββ₯Hββ€B, derive the generalization bound using the representer theorem.
Solution:
Step 1: By the representer theorem, fβ(β )=βi=1nβΞ±iβK(xiβ,β ).
Result: The generalization bound depends on β₯fββ₯Hβ (model complexity), Ξ» (regularization), and n (sample size). Larger RKHS norm or smaller Ξ» increases overfitting risk.
Result: Bessel's inequality follows from the non-negativity of the norm. Equality holds iff {enβ} is a complete orthonormal basis (Parseval's identity).
πProblem 7: Riesz Representation in LΒ²
Problem: Find the Riesz representative of the linear functional f(x)=β«01βx(t)β tdt on L2[0,1].
Step 4: The operator norm of f is β₯fβ₯=β₯gβ₯=1/3β=1/3β.
Result: The Riesz representative is g(t)=t. The functional f measures the "first moment" of x, and its Riesz representative is the function g(t)=t against which x is integrated.
πProblem 8: Compactness of Integral Operators
Problem: Show that the integral operator T:L2[0,1]βL2[0,1] defined by Tf(x)=β«01βK(x,y)f(y)dy with continuous kernel K is compact.
Solution:
Step 1: Let {fnβ} be a bounded sequence in L2[0,1] with β₯fnββ₯2ββ€M.
Step 2: The functions gnβ(x)=Tfnβ(x)=β«01βK(x,y)fnβ(y)dy satisfy:
Step 4: By the ArzelΓ -Ascoli theorem, every subsequence of {gnβ} has a uniformly convergent subsequence, hence a convergent subsequence in L2[0,1].
Result:T maps bounded sets to precompact sets, so T is compact. This is a fundamental result: integral operators with continuous kernels are compact, and their spectral theory is well-understood via the Mercer theorem.
πProblem 9: Hahn-Banach Extension
Problem: Extend the linear functional f(x)=x1β+x2β defined on the subspace M={(x1β,x2β,0):x1β,x2ββR}βR3 (with β2 norm) to all of R3 with the same norm.
Solution:
Step 1: The norm of f on M: For β₯xβ₯=1 with xβM:
Step 4: Another extension: f~β(x)=x1β+x2β+Ξ±x3β where β£Ξ±β£β€1.
For Ξ±=1: f~β(x)=x1β+x2β+x3β. Norm: β₯f~ββ₯=3β>2β β
For Ξ±=0: β₯f~ββ₯=2β β
Result: The extension f~β(x)=x1β+x2β (with coefficient 0 for x3β) preserves the norm. Hahn-Banach guarantees such extensions exist, but they may not be unique.
Problem: Let Tnβ:β2ββ2 be defined by Tnβ(x1β,x2β,β¦)=(xnβ,0,0,β¦). Is {Tnβ} pointwise bounded? Is it uniformly bounded?
Solution:
Step 1: Compute β₯Tnβxβ₯=β£xnββ£ for x=(x1β,x2β,β¦)ββ2.
Step 2: For fixed xββ2, ββ£xnββ£2<β, so β£xnββ£β0 as nββ. Therefore supnββ₯Tnβxβ₯=supnββ£xnββ£<β.
The family {Tnβ} is pointwise bounded.
Step 3: By the Uniform Boundedness Principle (Banach-Steinhaus), since β2 is a Banach space and {Tnβ} is pointwise bounded, we have supnββ₯Tnββ₯<β.
Step 4: Compute β₯Tnββ₯=supβ₯xβ₯=1ββ₯Tnβxβ₯=supβ₯xβ₯=1ββ£xnββ£=1 (achieved by x=enβ).
So supnββ₯Tnββ₯=1<β β
Result: The family is both pointwise bounded and uniformly bounded, with β₯Tnββ₯=1 for all n. The UBP guarantees this: pointwise boundedness on a Banach space implies uniform boundedness.
πProblem 11: Dual Space of lΒΉ
Problem: Show that the dual space of β1 is isometrically isomorphic to ββ.
Solution:
Step 1: Define the map Ξ¦:βββ(β1)β by Ξ¦(y)(x)=βn=1ββxnβynβ for xββ1, yβββ.
Step 3:Ξ¦ is linear and bounded: β₯Ξ¦(y)β₯=supβ₯xβ₯=1ββ£βxnβynββ£β€β₯yβ₯ββ.
Step 4: For the reverse, given fβ(β1)β, define ynβ=f(enβ) where enβ is the n-th standard basis vector. Then y=(ynβ)βββ with β₯yβ₯βββ€β₯fβ₯, and f(x)=βxnβynβ=Ξ¦(y)(x).
Step 5: Isometry: β₯Ξ¦(y)β₯=β₯yβ₯ββ (achieved by x=enβ where β£ynββ£ is maximized).
Result:(β1)ββ ββ isometrically. This is a fundamental duality result. Note that the dual of ββ is NOT β1 (it is β1's bidual, which is larger).
Common Mistakes
Mistake
Correct Approach
Assuming all normed spaces are complete
Q with absolute value is normed but not complete; completion gives R
Confusing bounded and unbounded operators
Differentiation is unbounded; integration is bounded
All compact operators are bounded, but not conversely (e.g., identity on infinite-dim space is bounded but not compact)
Ignoring the regularization parameter Ξ»
Without regularization (Ξ»=0), the problem may be ill-posed if the kernel matrix is singular
Connections to Machine Learning
βΉοΈ Connections to Machine Learning
Functional analysis provides the theoretical backbone for several ML paradigms: (1) Kernel methods (SVMs, kernel PCA, Gaussian processes) operate in RKHS, where the representer theorem guarantees solutions are finite linear combinations of kernel evaluations. (2) Regularization theory interprets weight decay as RKHS norm minimization, with the norm controlling function complexity. (3) Neural tangent kernels view infinite-width neural networks as kernel machines in a specific RKHS. (4) Functional PCA uses spectral decomposition of covariance operators for dimensionality reduction in function spaces. (5) Optimal transport spaces (Wasserstein distances) are studied as metric spaces with functional-analytic structure. (6) Reinforcement learning value functions live in Banach spaces, and convergence of temporal difference learning follows from properties of contractions on Banach spaces.
Exam/Interview Questions
Q1: State and explain the Cauchy-Schwarz inequality. Why is it important?
Q2: What is the difference between a Banach space and a Hilbert space?
Answer: A Banach space has only a norm (distance structure), while a Hilbert space has an inner product (angle structure). Every Hilbert space is a Banach space, but not conversely. The key difference: Hilbert spaces satisfy the parallelogram law β₯x+yβ₯2+β₯xβyβ₯2=2(β₯xβ₯2+β₯yβ₯2), which fails for general Banach spaces (e.g., β1, ββ). The inner product enables orthogonal projections, which are fundamental to many ML algorithms.
Q3: Explain the representer theorem and its significance for kernel methods.
Answer: Any minimizer of βL(yiβ,f(xiβ))+Ξ»β₯fβ₯H2β in an RKHS has the form fβ=βΞ±iβK(xiβ,β ). This reduces infinite-dimensional optimization to finding n coefficients Ξ±iβ, where n is the number of training points. It explains why kernel methods work: despite operating in potentially infinite-dimensional spaces, the solution is always a finite combination of kernel evaluations at training points.
Q4: Give an example of an unbounded operator and explain why boundedness matters.
Answer: The differentiation operator D:C1[0,1]βC[0,1] defined by Df=fβ² is unbounded: β₯D(sin(2Οnx))β₯ββ=2Οn while β₯sin(2Οnx)β₯ββ=1. Boundedness matters because bounded operators are continuous (preserve limits), and the spectral theorem applies to bounded self-adjoint operators. Unbounded operators require careful domain restrictions and are studied in quantum mechanics.
Answer: PCA finds the eigenvectors of the covariance matrix Ξ£=n1βXTX. In the infinite-dimensional setting, the covariance operator T:L2βL2 defined by (Tf)(x)=β«K(x,y)f(y)dy is a compact self-adjoint operator. The spectral theorem guarantees an orthonormal eigenbasis {enβ} with eigenvalues Ξ»nββ0. The top eigenfunctions capture the directions of maximum variance, generalizing PCA to function spaces (functional PCA).