← Math|98 of 100
Advanced Topics

Functional Analysis

Understand Banach spaces, Hilbert spaces, and their applications in kernel methods.

πŸ“‚ Function SpacesπŸ“– Lesson 98 of 100πŸŽ“ Free Course

Advertisement

Functional Analysis

ℹ️ Why It Matters

Functional analysis is the study of infinite-dimensional vector spaces equipped with additional structure (norms, inner products, topologies). It provides the mathematical foundation for kernel methods in machine learning, where data is mapped to high-dimensional (possibly infinite-dimensional) feature spaces. The Reproducing Kernel Hilbert Space (RKHS) framework unifies SVMs, Gaussian processes, and many regularization schemes. Understanding functional analysis reveals why kernel tricks work, how operators act on function spaces, and what guarantees exist for convergence of iterative methods in infinite dimensions.


Core Definitions

DfNormed Vector Space

A normed vector space is a pair (V,βˆ₯β‹…βˆ₯)(V, \|\cdot\|) where VV is a vector space over R\mathbb{R} or C\mathbb{C} and βˆ₯β‹…βˆ₯:Vβ†’[0,∞)\|\cdot\|: V \to [0, \infty) is a norm satisfying:

  1. βˆ₯xβˆ₯=0β€…β€ŠβŸΊβ€…β€Šx=0\|x\| = 0 \iff x = 0 (definiteness)
  2. βˆ₯Ξ±xβˆ₯=∣α∣βˆ₯xβˆ₯\|\alpha x\| = |\alpha|\|x\| for scalar Ξ±\alpha (absolute homogeneity)
  3. βˆ₯x+yβˆ₯≀βˆ₯xβˆ₯+βˆ₯yβˆ₯\|x + y\| \leq \|x\| + \|y\| (triangle inequality)

Examples include β„“p\ell^p spaces with the pp-norm, C[a,b]C[a,b] with the supremum norm, and LpL^p spaces with the Lebesgue pp-norm.

DfBanach Space

A Banach space is a normed vector space (V,βˆ₯β‹…βˆ₯)(V, \|\cdot\|) that is complete: every Cauchy sequence {xn}\{x_n\} in VV converges to a limit x∈Vx \in V. Completeness ensures that infinite processes (limits, series, iterative algorithms) produce well-defined results within the space. The spaces β„“p\ell^p (1≀pβ‰€βˆž1 \leq p \leq \infty) and LpL^p (1≀pβ‰€βˆž1 \leq p \leq \infty) are Banach spaces.

DfInner Product Space

An inner product space is a vector space VV equipped with an inner product βŸ¨β‹…,β‹…βŸ©:VΓ—Vβ†’C\langle \cdot, \cdot \rangle: V \times V \to \mathbb{C} (or R\mathbb{R}) satisfying:

  1. ⟨x,x⟩β‰₯0\langle x, x \rangle \geq 0 with equality iff x=0x = 0 (positive definiteness)
  2. ⟨x,y⟩=⟨y,xβŸ©β€Ύ\langle x, y \rangle = \overline{\langle y, x \rangle} (conjugate symmetry)
  3. ⟨αx+βy,z⟩=α⟨x,z⟩+β⟨y,z⟩\langle \alpha x + \beta y, z \rangle = \alpha\langle x, z \rangle + \beta\langle y, z \rangle (linearity in first argument)

The inner product induces a norm via βˆ₯xβˆ₯=⟨x,x⟩\|x\| = \sqrt{\langle x, x \rangle}.

DfHilbert Space

A Hilbert space is a complete inner product space. The norm is induced by the inner product: βˆ₯xβˆ₯=⟨x,x⟩\|x\| = \sqrt{\langle x, x \rangle}. Every Hilbert space has an orthonormal basis (possibly uncountable), and elements can be represented as generalized Fourier series in this basis. The key examples are β„“2\ell^2 (square-summable sequences) and L2L^2 (square-integrable functions).

DfLinear Operator

A linear operator T:Vβ†’WT: V \to W between vector spaces satisfies T(Ξ±x+Ξ²y)=Ξ±T(x)+Ξ²T(y)T(\alpha x + \beta y) = \alpha T(x) + \beta T(y). When VV and WW are normed spaces, TT is bounded if βˆ₯Tβˆ₯=sup⁑βˆ₯xβˆ₯=1βˆ₯Txβˆ₯<∞\|T\| = \sup_{\|x\|=1} \|Tx\| < \infty. Bounded linear operators are continuous. The set of bounded linear operators from XX to YY is denoted B(X,Y)\mathcal{B}(X, Y).

DfAdjoint Operator

For a bounded linear operator T:Hβ†’HT: H \to H on a Hilbert space, the adjoint Tβˆ—:Hβ†’HT^*: H \to H is the unique operator satisfying ⟨Tx,y⟩=⟨x,Tβˆ—y⟩\langle Tx, y \rangle = \langle x, T^*y \rangle for all x,y∈Hx, y \in H. If T=Tβˆ—T = T^*, the operator is self-adjoint (Hermitian). Self-adjoint operators have real eigenvalues and orthogonal eigenspaces. The adjoint generalizes the matrix transpose to infinite dimensions.

DfSpectrum

The spectrum Οƒ(T)\sigma(T) of a bounded linear operator TT is the set of λ∈C\lambda \in \mathbb{C} such that Tβˆ’Ξ»IT - \lambda I is not invertible. For self-adjoint operators on a Hilbert space, the spectrum is real and consists of three parts: point spectrum (eigenvalues), continuous spectrum, and residual spectrum. The spectral theorem describes the structure of self-adjoint operators via their spectra.

DfReproducing Kernel Hilbert Space (RKHS)

An RKHS H\mathcal{H} on a set XX is a Hilbert space of functions f:Xβ†’Rf: X \to \mathbb{R} such that for every x∈Xx \in X, the point evaluation functional f↦f(x)f \mapsto f(x) is bounded. The Riesz representation theorem guarantees a unique kernel K:XΓ—Xβ†’RK: X \times X \to \mathbb{R} such that f(x)=⟨f,K(x,β‹…)⟩Hf(x) = \langle f, K(x, \cdot) \rangle_{\mathcal{H}} for all f∈Hf \in \mathcal{H} and x∈Xx \in X. The kernel is positive definite and reproduces function values via inner products.

DfCompact Operator

A linear operator T:X→YT: X \to Y between Banach spaces is compact if it maps bounded sets to precompact sets (sets whose closure is compact). Equivalently, for every bounded sequence {xn}\{x_n\}, {Txn}\{Tx_n\} has a convergent subsequence. Compact operators are "almost finite-rank" and have well-behaved spectra (eigenvalues accumulate only at 0). The identity operator on an infinite-dimensional space is bounded but NOT compact.


Key Formulas

Cauchy-Schwarz Inequality

∣⟨x,yβŸ©βˆ£β‰€βˆ₯xβˆ₯β‹…βˆ₯yβˆ₯|\langle x, y \rangle| \leq \|x\| \cdot \|y\|

Here,

  • x,yx, y=Elements of an inner product space
  • langlex,yanglelangle x, y angle=Inner product of x and y
  • ∣x∣|x|=Norm induced by the inner product: |x| = sqrt(<x,x>)

Parallelogram Law

βˆ₯x+yβˆ₯2+βˆ₯xβˆ’yβˆ₯2=2(βˆ₯xβˆ₯2+βˆ₯yβˆ₯2)\|x + y\|^2 + \|x - y\|^2 = 2(\|x\|^2 + \|y\|^2)

Here,

  • x,yx, y=Elements of an inner product space

Orthogonal Projection

projU(x)=βˆ‘i=1n⟨x,ei⟩ei\text{proj}_U(x) = \sum_{i=1}^{n} \langle x, e_i \rangle e_i

Here,

  • UU=Closed subspace with orthonormal basis {e_i}
  • xx=Element to project
  • eie_i=Orthonormal basis vectors of U

Riesz Representation Theorem

f(x)=⟨f,K(x,β‹…)⟩Hf(x) = \langle f, K(x, \cdot) \rangle_{\mathcal{H}}

Here,

  • ff=Function in the RKHS
  • K(x,cdot)K(x, cdot)=Reproducing kernel evaluated at x, viewed as a function in H
  • mathcalHmathcal{H}=Reproducing Kernel Hilbert Space

Kernel Trick

K(x,y)=βŸ¨Ο•(x),Ο•(y)⟩HK(x, y) = \langle \phi(x), \phi(y) \rangle_{\mathcal{H}}

Here,

  • K(x,y)K(x, y)=Kernel function computing similarity between x and y
  • phi(x)phi(x)=Feature map from input space to Hilbert space
  • mathcalHmathcal{H}=Feature Hilbert space (possibly infinite-dimensional)

Spectral Decomposition (Compact Self-Adjoint)

T=βˆ‘n=1∞λnβŸ¨β‹…,en⟩enT = \sum_{n=1}^{\infty} \lambda_n \langle \cdot, e_n \rangle e_n

Here,

  • TT=Compact self-adjoint operator on Hilbert space
  • lambdanlambda_n=Eigenvalues (real, tending to zero)
  • ene_n=Orthonormal eigenvectors

Residual Sum of Squares with RKHS Penalty

min⁑f∈H1nβˆ‘i=1nL(yi,f(xi))+Ξ»βˆ₯fβˆ₯H2\min_{f \in \mathcal{H}} \frac{1}{n}\sum_{i=1}^{n} L(y_i, f(x_i)) + \lambda\|f\|_{\mathcal{H}}^2

Here,

  • LL=Loss function
  • ff=Function in the RKHS
  • lambdalambda=Regularization parameter
  • ∣f∣mathcalH2|f|_{mathcal{H}}^2=RKHS norm squared (complexity penalty)

HΓΆlder's Inequality

βˆ₯fgβˆ₯1≀βˆ₯fβˆ₯pβˆ₯gβˆ₯q,1p+1q=1\|fg\|_1 \leq \|f\|_p \|g\|_q, \quad \frac{1}{p} + \frac{1}{q} = 1

Here,

  • ff=Function in L^p
  • gg=Function in L^q
  • p,qp, q=Conjugate exponents: 1/p + 1/q = 1

Important Theorems

ThRiesz Representation Theorem (Hilbert Space)

Let HH be a Hilbert space and f:Hβ†’Rf: H \to \mathbb{R} a bounded linear functional. Then there exists a unique y∈Hy \in H such that f(x)=⟨x,y⟩f(x) = \langle x, y \rangle for all x∈Hx \in H, and βˆ₯fβˆ₯=βˆ₯yβˆ₯\|f\| = \|y\|. This theorem establishes that every bounded linear functional on a Hilbert space is represented by inner product with a fixed element. It is the foundation for the representer theorem in RKHS.

ThOpen Mapping Theorem

If T:Xβ†’YT: X \to Y is a surjective bounded linear operator between Banach spaces XX and YY, then TT is an open map (it maps open sets to open sets). This implies that if TT is bijective, its inverse Tβˆ’1T^{-1} is also bounded. The open mapping theorem is one of the three fundamental theorems of functional analysis (along with Hahn-Banach and uniform boundedness).

ThHahn-Banach Theorem

Let XX be a normed space, MβŠ†XM \subseteq X a subspace, and f:Mβ†’Rf: M \to \mathbb{R} a bounded linear functional. Then ff extends to a bounded linear functional f~:Xβ†’R\tilde{f}: X \to \mathbb{R} with βˆ₯f~βˆ₯=βˆ₯fβˆ₯\|\tilde{f}\| = \|f\|. This guarantees the existence of "enough" continuous linear functionals to separate points. It is used to prove the existence of support functionals, dual bases, and the Riesz representation theorem.

ThSpectral Theorem (Compact Self-Adjoint Operators)

Let T:Hβ†’HT: H \to H be a compact self-adjoint operator on a Hilbert space. Then there exists an orthonormal basis {en}\{e_n\} of eigenvectors with corresponding real eigenvalues Ξ»n\lambda_n such that Ten=Ξ»nenTe_n = \lambda_n e_n. Moreover, TT has the representation Tx=βˆ‘nΞ»n⟨x,en⟩enTx = \sum_n \lambda_n \langle x, e_n \rangle e_n, where Ξ»nβ†’0\lambda_n \to 0 if HH is infinite-dimensional. This generalizes the finite-dimensional spectral decomposition to compact operators.

ThRepresenter Theorem

Let H\mathcal{H} be a RKHS, KK its reproducing kernel, and S={(xi,yi)}i=1nS = \{(x_i, y_i)\}_{i=1}^n a training set. For any loss function LL and regularization parameter Ξ»>0\lambda > 0, any minimizer of 1nβˆ‘i=1nL(yi,f(xi))+Ξ»βˆ₯fβˆ₯H2\frac{1}{n}\sum_{i=1}^n L(y_i, f(x_i)) + \lambda\|f\|_{\mathcal{H}}^2 has the form fβˆ—(β‹…)=βˆ‘i=1nΞ±iK(xi,β‹…)f^*(\cdot) = \sum_{i=1}^n \alpha_i K(x_i, \cdot). This reduces infinite-dimensional optimization to a finite-dimensional problem in the coefficients Ξ±i\alpha_i.

ThUniform Boundedness Principle (Banach-Steinhaus)

Let XX be a Banach space, YY a normed space, and {TΞ±}α∈A\{T_\alpha\}_{\alpha \in A} a family of bounded linear operators from XX to YY. If sup⁑αβˆ₯TΞ±xβˆ₯<∞\sup_\alpha \|T_\alpha x\| < \infty for each x∈Xx \in X (pointwise bounded), then sup⁑αβˆ₯TΞ±βˆ₯<∞\sup_\alpha \|T_\alpha\| < \infty (uniformly bounded). This theorem is essential for proving convergence of Fourier series and for the spectral theory of unbounded operators.


Worked Examples

πŸ“Verifying Cauchy-Schwarz Inequality

Problem: Verify the Cauchy-Schwarz inequality for x=(1,2,3)\mathbf{x} = (1, 2, 3) and y=(4,5,6)\mathbf{y} = (4, 5, 6) in R3\mathbb{R}^3.

Solution:

Step 1: Compute the inner product.

⟨x,y⟩=1β‹…4+2β‹…5+3β‹…6=4+10+18=32\langle \mathbf{x}, \mathbf{y} \rangle = 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 4 + 10 + 18 = 32

Step 2: Compute the norms.

βˆ₯xβˆ₯=12+22+32=1+4+9=14β‰ˆ3.742\|\mathbf{x}\| = \sqrt{1^2 + 2^2 + 3^2} = \sqrt{1 + 4 + 9} = \sqrt{14} \approx 3.742
βˆ₯yβˆ₯=42+52+62=16+25+36=77β‰ˆ8.775\|\mathbf{y}\| = \sqrt{4^2 + 5^2 + 6^2} = \sqrt{16 + 25 + 36} = \sqrt{77} \approx 8.775

Step 3: Verify the inequality.

∣⟨x,y⟩∣=∣32∣=32|\langle \mathbf{x}, \mathbf{y} \rangle| = |32| = 32
βˆ₯xβˆ₯β‹…βˆ₯yβˆ₯=14β‹…77=1078β‰ˆ32.833\|\mathbf{x}\| \cdot \|\mathbf{y}\| = \sqrt{14} \cdot \sqrt{77} = \sqrt{1078} \approx 32.833

Step 4: Check: 32≀32.83332 \leq 32.833 βœ“

Equality condition: Cauchy-Schwarz holds with equality iff x\mathbf{x} and y\mathbf{y} are linearly dependent. Here 32<32.83332 < 32.833 (strict inequality), so x\mathbf{x} and y\mathbf{y} are linearly independent.

πŸ“Orthogonal Projection in LΒ²[0,1]

Problem: Project f(x)=xf(x) = x onto the subspace spanned by {1,x2}\{1, x^2\} in L2[0,1]L^2[0,1] with inner product ⟨f,g⟩=∫01f(x)g(x)dx\langle f, g \rangle = \int_0^1 f(x)g(x)dx.

Solution:

Step 1: First orthogonalize the basis using Gram-Schmidt.

Let e0=1e_0 = 1 (already normalized: βˆ₯1βˆ₯2=∫011 dx=1\|1\|^2 = \int_0^1 1 \, dx = 1).

Let v1=x2βˆ’βŸ¨x2,e0⟩e0=x2βˆ’βˆ«01x2dxβ‹…1=x2βˆ’1/3v_1 = x^2 - \langle x^2, e_0 \rangle e_0 = x^2 - \int_0^1 x^2 dx \cdot 1 = x^2 - 1/3.

Normalize: βˆ₯v1βˆ₯2=∫01(x2βˆ’1/3)2dx=∫01(x4βˆ’2x2/3+1/9)dx=1/5βˆ’2/9+1/9=1/5βˆ’1/9=4/45\|v_1\|^2 = \int_0^1 (x^2 - 1/3)^2 dx = \int_0^1 (x^4 - 2x^2/3 + 1/9)dx = 1/5 - 2/9 + 1/9 = 1/5 - 1/9 = 4/45.

e1=x2βˆ’1/34/45=452(x2βˆ’1/3)e_1 = \frac{x^2 - 1/3}{\sqrt{4/45}} = \frac{\sqrt{45}}{2}(x^2 - 1/3)

Step 2: Project f(x)=xf(x) = x onto span{e0,e1}\text{span}\{e_0, e_1\}:

proj(x)=⟨x,e0⟩e0+⟨x,e1⟩e1\text{proj}(x) = \langle x, e_0 \rangle e_0 + \langle x, e_1 \rangle e_1
⟨x,1⟩=∫01x dx=1/2\langle x, 1 \rangle = \int_0^1 x \, dx = 1/2⟨x,x2βˆ’1/3⟩=∫01x(x2βˆ’1/3)dx=∫01(x3βˆ’x/3)dx=1/4βˆ’1/6=1/12\langle x, x^2 - 1/3 \rangle = \int_0^1 x(x^2 - 1/3)dx = \int_0^1 (x^3 - x/3)dx = 1/4 - 1/6 = 1/12
proj(x)=12β‹…1+1/124/45(x2βˆ’1/3)=12+4548(x2βˆ’1/3)=12+1516(x2βˆ’1/3)\text{proj}(x) = \frac{1}{2} \cdot 1 + \frac{1/12}{4/45}(x^2 - 1/3) = \frac{1}{2} + \frac{45}{48}(x^2 - 1/3) = \frac{1}{2} + \frac{15}{16}(x^2 - 1/3)
=12+1516x2βˆ’1548=1516x2+12βˆ’516=1516x2+316= \frac{1}{2} + \frac{15}{16}x^2 - \frac{15}{48} = \frac{15}{16}x^2 + \frac{1}{2} - \frac{5}{16} = \frac{15}{16}x^2 + \frac{3}{16}

Result: proj(x)=1516x2+316\text{proj}(x) = \frac{15}{16}x^2 + \frac{3}{16}

Verification: ⟨xβˆ’proj(x),1⟩=∫01(xβˆ’15x2/16βˆ’3/16)dx=1/2βˆ’15/32βˆ’3/16=0\langle x - \text{proj}(x), 1 \rangle = \int_0^1 (x - 15x^2/16 - 3/16)dx = 1/2 - 15/32 - 3/16 = 0 βœ“

πŸ“Kernel Trick Computation

Problem: Compute the kernel K(x,y)=(xTy+c)dK(x, y) = (x^T y + c)^d for d=2d = 2, c=1c = 1, x=(x1,x2)x = (x_1, x_2) and y=(y1,y2)y = (y_1, y_2), and show it equals an inner product in a feature space.

Solution:

Step 1: Expand the kernel.

K(x,y)=(x1y1+x2y2+1)2K(x, y) = (x_1 y_1 + x_2 y_2 + 1)^2
=(x1y1)2+(x2y2)2+1+2x1y1x2y2+2x1y1+2x2y2= (x_1 y_1)^2 + (x_2 y_2)^2 + 1 + 2x_1 y_1 x_2 y_2 + 2x_1 y_1 + 2x_2 y_2

Step 2: Define the feature map Ο•:R2β†’R6\phi: \mathbb{R}^2 \to \mathbb{R}^6:

Ο•(x)=(x12,x22,1,2x1x2,2x1,2x2)\phi(x) = (x_1^2, x_2^2, 1, \sqrt{2}x_1 x_2, \sqrt{2}x_1, \sqrt{2}x_2)

Step 3: Compute the inner product in feature space:

βŸ¨Ο•(x),Ο•(y)⟩=x12y12+x22y22+1+2x1x2y1y2+2x1y1+2x2y2=K(x,y)\langle \phi(x), \phi(y) \rangle = x_1^2 y_1^2 + x_2^2 y_2^2 + 1 + 2x_1 x_2 y_1 y_2 + 2x_1 y_1 + 2x_2 y_2 = K(x, y)

Result: The polynomial kernel of degree 2 with c=1c=1 computes an inner product in a 6-dimensional feature space without explicitly computing Ο•(x)\phi(x) or Ο•(y)\phi(y).

For general dd and input dimension nn, the explicit feature space has dimension (n+dd)\binom{n+d}{d}, which grows exponentially. The kernel trick avoids this exponential cost.

πŸ“Spectral Decomposition of a Compact Operator

Problem: Find the spectral decomposition of the integral operator T:L2[0,Ο€]β†’L2[0,Ο€]T: L^2[0, \pi] \to L^2[0, \pi] defined by Tf(x)=∫0Ο€sin⁑(x)sin⁑(y)f(y)dyTf(x) = \int_0^\pi \sin(x)\sin(y) f(y) dy.

Solution:

Step 1: Identify the operator. TT is a rank-1 operator since the kernel K(x,y)=sin⁑(x)sin⁑(y)K(x,y) = \sin(x)\sin(y) is separable.

Step 2: Write Tf(x)=sin⁑(x)∫0Ο€sin⁑(y)f(y)dy=sin⁑(x)⟨f,sin⁑⟩Tf(x) = \sin(x) \int_0^\pi \sin(y) f(y) dy = \sin(x) \langle f, \sin \rangle.

Step 3: Find eigenvalues and eigenvectors.

If f=sin⁑f = \sin, then T(sin⁑)(x)=sin⁑(x)⟨sin⁑,sin⁑⟩=sin⁑(x)β‹…Ο€2T(\sin)(x) = \sin(x) \langle \sin, \sin \rangle = \sin(x) \cdot \frac{\pi}{2}.

So T(sin⁑)=Ο€2sin⁑T(\sin) = \frac{\pi}{2} \sin, meaning Ξ»1=Ο€2\lambda_1 = \frac{\pi}{2} is an eigenvalue with eigenvector e1=2Ο€sin⁑(x)e_1 = \sqrt{\frac{2}{\pi}} \sin(x) (normalized).

Step 4: For any ff orthogonal to sin⁑\sin (i.e., ⟨f,sin⁑⟩=0\langle f, \sin \rangle = 0):

Tf(x)=sin⁑(x)β‹…0=0Tf(x) = \sin(x) \cdot 0 = 0

So Tf=0Tf = 0 for all fβŠ₯sin⁑f \perp \sin. This means all other eigenvalues are Ξ»n=0\lambda_n = 0 for nβ‰₯2n \geq 2.

Step 5: Spectral decomposition:

Tf=Ο€2⟨f,2/Ο€sin⁑⟩2/Ο€sin⁑=⟨f,sin⁑⟩sin⁑Tf = \frac{\pi}{2} \langle f, \sqrt{2/\pi}\sin \rangle \sqrt{2/\pi}\sin = \langle f, \sin \rangle \sin

The operator is compact (rank 1), self-adjoint, and has one non-zero eigenvalue Ξ»1=Ο€/2\lambda_1 = \pi/2.


Practice Problems

πŸ“Problem 1: Completeness of lΒ² Space

Problem: Show that β„“2={(xn)n=1∞:βˆ‘n=1∞∣xn∣2<∞}\ell^2 = \{(x_n)_{n=1}^\infty : \sum_{n=1}^\infty |x_n|^2 < \infty\} is a Hilbert space.

Solution:

Step 1: Define the inner product ⟨x,y⟩=βˆ‘n=1∞xnynβ€Ύ\langle x, y \rangle = \sum_{n=1}^\infty x_n \overline{y_n}. This converges absolutely by Cauchy-Schwarz on partial sums.

Step 2: Verify inner product axioms:

  • ⟨x,x⟩=βˆ‘βˆ£xn∣2β‰₯0\langle x, x \rangle = \sum |x_n|^2 \geq 0 with equality iff x=0x = 0 βœ“
  • ⟨x,y⟩=⟨y,xβŸ©β€Ύ\langle x, y \rangle = \overline{\langle y, x \rangle} βœ“
  • Linearity in first argument βœ“

Step 3: Show completeness. Let {x(k)}\{x^{(k)}\} be a Cauchy sequence in β„“2\ell^2.

For each fixed nn, ∣xn(k)βˆ’xn(m)βˆ£β‰€βˆ₯x(k)βˆ’x(m)βˆ₯|x_n^{(k)} - x_n^{(m)}| \leq \|x^{(k)} - x^{(m)}\|, so {xn(k)}k=1∞\{x_n^{(k)}\}_{k=1}^\infty is Cauchy in R\mathbb{R}, hence converges to some xnx_n.

Step 4: Show x=(xn)βˆˆβ„“2x = (x_n) \in \ell^2 and x(k)β†’xx^{(k)} \to x.

Given Ο΅>0\epsilon > 0, choose KK such that βˆ₯x(k)βˆ’x(m)βˆ₯<Ο΅\|x^{(k)} - x^{(m)}\| < \epsilon for k,mβ‰₯Kk, m \geq K. For any finite NN:

βˆ‘n=1N∣xn(k)βˆ’xn∣2=lim⁑mβ†’βˆžβˆ‘n=1N∣xn(k)βˆ’xn(m)∣2≀ϡ2\sum_{n=1}^N |x_n^{(k)} - x_n|^2 = \lim_{m \to \infty} \sum_{n=1}^N |x_n^{(k)} - x_n^{(m)}|^2 \leq \epsilon^2

Taking Nβ†’βˆžN \to \infty: βˆ₯x(k)βˆ’xβˆ₯≀ϡ\|x^{(k)} - x\| \leq \epsilon, so x(k)β†’xx^{(k)} \to x in β„“2\ell^2.

Result: β„“2\ell^2 is complete, hence a Hilbert space.

πŸ“Problem 2: Boundedness of an Operator

Problem: Show that the differentiation operator T:C[0,1]β†’C[0,1]T: C[0,1] \to C[0,1] defined by Tf=fβ€²Tf = f' is not bounded with the supremum norm.

Solution:

Step 1: Consider the family of functions fn(x)=sin⁑(2Ο€nx)f_n(x) = \sin(2\pi n x).

Step 2: Compute norms.

βˆ₯fnβˆ₯∞=sup⁑x∈[0,1]∣sin⁑(2Ο€nx)∣=1\|f_n\|_\infty = \sup_{x \in [0,1]} |\sin(2\pi n x)| = 1
βˆ₯Tfnβˆ₯∞=βˆ₯fnβ€²βˆ₯∞=sup⁑x∈[0,1]∣2Ο€ncos⁑(2Ο€nx)∣=2Ο€n\|Tf_n\|_\infty = \|f_n'\|_\infty = \sup_{x \in [0,1]} |2\pi n \cos(2\pi n x)| = 2\pi n

Step 3: Compute the operator norm.

βˆ₯Tβˆ₯=sup⁑βˆ₯fβˆ₯∞=1βˆ₯Tfβˆ₯∞β‰₯βˆ₯Tfnβˆ₯∞βˆ₯fnβˆ₯∞=2Ο€n\|T\| = \sup_{\|f\|_\infty = 1} \|Tf\|_\infty \geq \frac{\|Tf_n\|_\infty}{\|f_n\|_\infty} = 2\pi n

Step 4: Since 2Ο€nβ†’βˆž2\pi n \to \infty as nβ†’βˆžn \to \infty, we have βˆ₯Tβˆ₯=∞\|T\| = \infty.

Result: The differentiation operator is unbounded on C[0,1]C[0,1] with the supremum norm. This demonstrates that differentiation is a fundamentally different operation from integration (which is bounded). In quantum mechanics, the momentum operator p^=βˆ’iℏddx\hat{p} = -i\hbar \frac{d}{dx} is an unbounded self-adjoint operator.

πŸ“Problem 3: RKHS Representer Theorem Application

Problem: Given data {(x1,y1),(x2,y2),(x3,y3)}={(0,1),(1,3),(2,5)}\{(x_1, y_1), (x_2, y_2), (x_3, y_3)\} = \{(0, 1), (1, 3), (2, 5)\}, find the function fβˆ—f^* that minimizes βˆ‘i=13(yiβˆ’f(xi))2+Ξ»βˆ₯fβˆ₯H2\sum_{i=1}^3 (y_i - f(x_i))^2 + \lambda\|f\|_{\mathcal{H}}^2 using the RBF kernel K(x,y)=eβˆ’Ξ³(xβˆ’y)2K(x, y) = e^{-\gamma(x-y)^2} with Ξ³=1\gamma = 1 and Ξ»=0.1\lambda = 0.1.

Solution:

Step 1: By the Representer Theorem, fβˆ—(β‹…)=βˆ‘i=13Ξ±iK(xi,β‹…)f^*(\cdot) = \sum_{i=1}^3 \alpha_i K(x_i, \cdot).

Step 2: Define the kernel matrix Kij=K(xi,xj)=eβˆ’(xiβˆ’xj)2K_{ij} = K(x_i, x_j) = e^{-(x_i - x_j)^2}:

K=(1eβˆ’1eβˆ’4eβˆ’11eβˆ’1eβˆ’4eβˆ’11)β‰ˆ(10.3680.0180.36810.3680.0180.3681)K = \begin{pmatrix} 1 & e^{-1} & e^{-4} \\ e^{-1} & 1 & e^{-1} \\ e^{-4} & e^{-1} & 1 \end{pmatrix} \approx \begin{pmatrix} 1 & 0.368 & 0.018 \\ 0.368 & 1 & 0.368 \\ 0.018 & 0.368 & 1 \end{pmatrix}

Step 3: The optimal Ξ±\boldsymbol{\alpha} satisfies (K+Ξ»I)Ξ±=y(K + \lambda I)\boldsymbol{\alpha} = \mathbf{y}:

(1.10.3680.0180.3681.10.3680.0180.3681.1)Ξ±=(135)\begin{pmatrix} 1.1 & 0.368 & 0.018 \\ 0.368 & 1.1 & 0.368 \\ 0.018 & 0.368 & 1.1 \end{pmatrix} \boldsymbol{\alpha} = \begin{pmatrix} 1 \\ 3 \\ 5 \end{pmatrix}

Step 4: Solve the 3Γ—33 \times 3 system (e.g., via Gaussian elimination):

Ξ±β‰ˆ(0.24,1.93,3.63)\boldsymbol{\alpha} \approx (0.24, 1.93, 3.63)

Step 5: The solution is fβˆ—(x)=0.24β‹…eβˆ’x2+1.93β‹…eβˆ’(xβˆ’1)2+3.63β‹…eβˆ’(xβˆ’2)2f^*(x) = 0.24 \cdot e^{-x^2} + 1.93 \cdot e^{-(x-1)^2} + 3.63 \cdot e^{-(x-2)^2}.

Result: The function interpolates the data while remaining smooth (controlled by Ξ»\lambda). The representer theorem reduced an infinite-dimensional problem to solving a 3Γ—33 \times 3 linear system.

πŸ“Problem 4: Orthonormal Basis in LΒ²[-Ο€, Ο€]

Problem: Verify that the Fourier basis {einx/2Ο€}n∈Z\{e^{inx}/\sqrt{2\pi}\}_{n \in \mathbb{Z}} is orthonormal in L2[βˆ’Ο€,Ο€]L^2[-\pi, \pi].

Solution:

Step 1: The inner product in L2[βˆ’Ο€,Ο€]L^2[-\pi, \pi] is ⟨f,g⟩=βˆ«βˆ’Ο€Ο€f(x)g(x)β€Ύdx\langle f, g \rangle = \int_{-\pi}^{\pi} f(x)\overline{g(x)} dx.

Step 2: Compute ⟨einx,eimx⟩\langle e^{inx}, e^{imx} \rangle:

βˆ«βˆ’Ο€Ο€einxeimxβ€Ύdx=βˆ«βˆ’Ο€Ο€ei(nβˆ’m)xdx\int_{-\pi}^{\pi} e^{inx} \overline{e^{imx}} dx = \int_{-\pi}^{\pi} e^{i(n-m)x} dx

Step 3: Evaluate the integral.

If n=mn = m: βˆ«βˆ’Ο€Ο€e0dx=2Ο€\int_{-\pi}^{\pi} e^{0} dx = 2\pi

If nβ‰ mn \neq m: βˆ«βˆ’Ο€Ο€ei(nβˆ’m)xdx=[ei(nβˆ’m)xi(nβˆ’m)]βˆ’Ο€Ο€=ei(nβˆ’m)Ο€βˆ’eβˆ’i(nβˆ’m)Ο€i(nβˆ’m)=2isin⁑((nβˆ’m)Ο€)i(nβˆ’m)=0\int_{-\pi}^{\pi} e^{i(n-m)x} dx = \left[\frac{e^{i(n-m)x}}{i(n-m)}\right]_{-\pi}^{\pi} = \frac{e^{i(n-m)\pi} - e^{-i(n-m)\pi}}{i(n-m)} = \frac{2i\sin((n-m)\pi)}{i(n-m)} = 0

Step 4: Therefore:

⟨einx2Ο€,eimx2Ο€βŸ©=12Ο€βŸ¨einx,eimx⟩={1ifΒ n=m0ifΒ nβ‰ m\langle \frac{e^{inx}}{\sqrt{2\pi}}, \frac{e^{imx}}{\sqrt{2\pi}} \rangle = \frac{1}{2\pi}\langle e^{inx}, e^{imx} \rangle = \begin{cases} 1 & \text{if } n = m \\ 0 & \text{if } n \neq m \end{cases}

Result: The Fourier basis is orthonormal. By completeness, any f∈L2[βˆ’Ο€,Ο€]f \in L^2[-\pi, \pi] can be written as f(x)=βˆ‘n∈Zf^(n)einxf(x) = \sum_{n \in \mathbb{Z}} \hat{f}(n) e^{inx} with f^(n)=12Ο€βˆ«βˆ’Ο€Ο€f(x)eβˆ’inxdx\hat{f}(n) = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x) e^{-inx} dx.

πŸ“Problem 5: Bounding Generalization Error via RKHS Norm

Problem: For a function fβˆ—f^* in an RKHS H\mathcal{H} with βˆ₯fβˆ—βˆ₯H≀B\|f^*\|_{\mathcal{H}} \leq B, derive the generalization bound using the representer theorem.

Solution:

Step 1: By the representer theorem, fβˆ—(β‹…)=βˆ‘i=1nΞ±iK(xi,β‹…)f^*(\cdot) = \sum_{i=1}^n \alpha_i K(x_i, \cdot).

Step 2: The RKHS norm constraint gives:

βˆ₯fβˆ—βˆ₯H2=βˆ‘i,jΞ±iΞ±jK(xi,xj)=Ξ±TKα≀B2\|f^*\|_{\mathcal{H}}^2 = \sum_{i,j} \alpha_i \alpha_j K(x_i, x_j) = \boldsymbol{\alpha}^T K \boldsymbol{\alpha} \leq B^2

Step 3: For the squared loss, the representer theorem solution gives:

Ξ±=(K+Ξ»I)βˆ’1y\boldsymbol{\alpha} = (K + \lambda I)^{-1} \mathbf{y}

Step 4: The generalization error (expected risk minus empirical risk) can be bounded using the covering number of the RKHS ball:

RademacherΒ complexity≀Btr(K)/nΞ»+ΞΊ\text{Rademacher complexity} \leq \frac{B\sqrt{\text{tr}(K)/n}}{\lambda + \kappa}

where ΞΊ\kappa is the minimum eigenvalue of KK.

Step 5: By standard generalization theory:

R(fβˆ—)≀R^n(fβˆ—)+O(BΞΊnn(Ξ»+ΞΊn))R(f^*) \leq \hat{R}_n(f^*) + O\left(\frac{B\sqrt{\kappa_n}}{n(\lambda + \kappa_n)}\right)

where ΞΊn\kappa_n captures the effective dimension.

Result: The generalization bound depends on βˆ₯fβˆ—βˆ₯H\|f^*\|_{\mathcal{H}} (model complexity), Ξ»\lambda (regularization), and nn (sample size). Larger RKHS norm or smaller Ξ»\lambda increases overfitting risk.

πŸ“Problem 6: Bessel's Inequality

Problem: Prove Bessel's inequality for an orthonormal sequence {en}\{e_n\} in a Hilbert space HH: βˆ‘n=1∞∣⟨x,en⟩∣2≀βˆ₯xβˆ₯2\sum_{n=1}^{\infty} |\langle x, e_n \rangle|^2 \leq \|x\|^2.

Solution:

Step 1: Let SN=βˆ‘n=1N⟨x,en⟩enS_N = \sum_{n=1}^{N} \langle x, e_n \rangle e_n be the partial projection onto span{e1,…,eN}\text{span}\{e_1, \ldots, e_N\}.

Step 2: Compute the norm of the residual xβˆ’SNx - S_N:

βˆ₯xβˆ’SNβˆ₯2=βˆ₯xβˆ₯2βˆ’2Re⟨x,SN⟩+βˆ₯SNβˆ₯2\|x - S_N\|^2 = \|x\|^2 - 2\text{Re}\langle x, S_N \rangle + \|S_N\|^2

Step 3: Since {en}\{e_n\} is orthonormal:

⟨x,SN⟩=βˆ‘n=1N⟨x,enβŸ©β€ΎβŸ¨x,en⟩=βˆ‘n=1N∣⟨x,en⟩∣2\langle x, S_N \rangle = \sum_{n=1}^{N} \overline{\langle x, e_n \rangle} \langle x, e_n \rangle = \sum_{n=1}^{N} |\langle x, e_n \rangle|^2
βˆ₯SNβˆ₯2=βˆ‘n=1N∣⟨x,en⟩∣2\|S_N\|^2 = \sum_{n=1}^{N} |\langle x, e_n \rangle|^2

Step 4: Therefore:

βˆ₯xβˆ’SNβˆ₯2=βˆ₯xβˆ₯2βˆ’βˆ‘n=1N∣⟨x,en⟩∣2β‰₯0\|x - S_N\|^2 = \|x\|^2 - \sum_{n=1}^{N} |\langle x, e_n \rangle|^2 \geq 0

Step 5: Since this holds for all NN:

βˆ‘n=1∞∣⟨x,en⟩∣2≀βˆ₯xβˆ₯2\sum_{n=1}^{\infty} |\langle x, e_n \rangle|^2 \leq \|x\|^2

Result: Bessel's inequality follows from the non-negativity of the norm. Equality holds iff {en}\{e_n\} is a complete orthonormal basis (Parseval's identity).

πŸ“Problem 7: Riesz Representation in LΒ²

Problem: Find the Riesz representative of the linear functional f(x)=∫01x(t)β‹…t dtf(x) = \int_0^1 x(t) \cdot t \, dt on L2[0,1]L^2[0,1].

Solution:

Step 1: By the Riesz representation theorem, there exists g∈L2[0,1]g \in L^2[0,1] such that f(x)=⟨x,g⟩=∫01x(t)g(t) dtf(x) = \langle x, g \rangle = \int_0^1 x(t) g(t) \, dt for all x∈L2[0,1]x \in L^2[0,1].

Step 2: Comparing f(x)=∫01x(t)β‹…t dtf(x) = \int_0^1 x(t) \cdot t \, dt with ⟨x,g⟩=∫01x(t)g(t) dt\langle x, g \rangle = \int_0^1 x(t) g(t) \, dt:

We identify g(t)=tg(t) = t.

Step 3: Verify: βˆ₯gβˆ₯2=∫01t2 dt=1/3\|g\|^2 = \int_0^1 t^2 \, dt = 1/3.

Step 4: The operator norm of ff is βˆ₯fβˆ₯=βˆ₯gβˆ₯=1/3=1/3\|f\| = \|g\| = \sqrt{1/3} = 1/\sqrt{3}.

Result: The Riesz representative is g(t)=tg(t) = t. The functional ff measures the "first moment" of xx, and its Riesz representative is the function g(t)=tg(t) = t against which xx is integrated.

πŸ“Problem 8: Compactness of Integral Operators

Problem: Show that the integral operator T:L2[0,1]β†’L2[0,1]T: L^2[0,1] \to L^2[0,1] defined by Tf(x)=∫01K(x,y)f(y)dyTf(x) = \int_0^1 K(x,y)f(y)dy with continuous kernel KK is compact.

Solution:

Step 1: Let {fn}\{f_n\} be a bounded sequence in L2[0,1]L^2[0,1] with βˆ₯fnβˆ₯2≀M\|f_n\|_2 \leq M.

Step 2: The functions gn(x)=Tfn(x)=∫01K(x,y)fn(y)dyg_n(x) = Tf_n(x) = \int_0^1 K(x,y)f_n(y)dy satisfy:

∣gn(x)βˆ£β‰€(∫01∣K(x,y)∣2dy)1/2βˆ₯fnβˆ₯2≀Mβˆ₯Kβˆ₯L2([0,1]2)|g_n(x)| \leq \left(\int_0^1 |K(x,y)|^2 dy\right)^{1/2} \|f_n\|_2 \leq M\|K\|_{L^2([0,1]^2)}

So {gn}\{g_n\} is uniformly bounded.

Step 3: Since KK is continuous on the compact set [0,1]2[0,1]^2, it is uniformly continuous. For ∣x1βˆ’x2∣<Ξ΄|x_1 - x_2| < \delta:

∣gn(x1)βˆ’gn(x2)βˆ£β‰€βˆ«01∣K(x1,y)βˆ’K(x2,y)∣∣fn(y)∣dy|g_n(x_1) - g_n(x_2)| \leq \int_0^1 |K(x_1,y) - K(x_2,y)| |f_n(y)| dy
≀(∫01∣K(x1,y)βˆ’K(x2,y)∣2dy)1/2Mβ†’0\leq \left(\int_0^1 |K(x_1,y) - K(x_2,y)|^2 dy\right)^{1/2} M \to 0

So {gn}\{g_n\} is equicontinuous.

Step 4: By the ArzelΓ -Ascoli theorem, every subsequence of {gn}\{g_n\} has a uniformly convergent subsequence, hence a convergent subsequence in L2[0,1]L^2[0,1].

Result: TT maps bounded sets to precompact sets, so TT is compact. This is a fundamental result: integral operators with continuous kernels are compact, and their spectral theory is well-understood via the Mercer theorem.

πŸ“Problem 9: Hahn-Banach Extension

Problem: Extend the linear functional f(x)=x1+x2f(x) = x_1 + x_2 defined on the subspace M={(x1,x2,0):x1,x2∈R}βŠ‚R3M = \{(x_1, x_2, 0) : x_1, x_2 \in \mathbb{R}\} \subset \mathbb{R}^3 (with β„“2\ell^2 norm) to all of R3\mathbb{R}^3 with the same norm.

Solution:

Step 1: The norm of ff on MM: For βˆ₯xβˆ₯=1\|x\| = 1 with x∈Mx \in M:

∣f(x)∣=∣x1+x2βˆ£β‰€2x12+x22=2|f(x)| = |x_1 + x_2| \leq \sqrt{2}\sqrt{x_1^2 + x_2^2} = \sqrt{2}

Equality when x1=x2=1/2x_1 = x_2 = 1/\sqrt{2}, so βˆ₯fβˆ₯=2\|f\| = \sqrt{2}.

Step 2: By Hahn-Banach, extend ff to f~\tilde{f} on R3\mathbb{R}^3 with βˆ₯f~βˆ₯=2\|\tilde{f}\| = \sqrt{2}.

Step 3: One extension: f~(x)=x1+x2+0β‹…x3=x1+x2\tilde{f}(x) = x_1 + x_2 + 0 \cdot x_3 = x_1 + x_2 (just ignore x3x_3).

Check norm: ∣f~(x)∣=∣x1+x2βˆ£β‰€2βˆ₯xβˆ₯|\tilde{f}(x)| = |x_1 + x_2| \leq \sqrt{2}\|x\| βœ“

Step 4: Another extension: f~(x)=x1+x2+Ξ±x3\tilde{f}(x) = x_1 + x_2 + \alpha x_3 where βˆ£Ξ±βˆ£β‰€1|\alpha| \leq 1.

For Ξ±=1\alpha = 1: f~(x)=x1+x2+x3\tilde{f}(x) = x_1 + x_2 + x_3. Norm: βˆ₯f~βˆ₯=3>2\|\tilde{f}\| = \sqrt{3} > \sqrt{2} βœ—

For Ξ±=0\alpha = 0: βˆ₯f~βˆ₯=2\|\tilde{f}\| = \sqrt{2} βœ“

Result: The extension f~(x)=x1+x2\tilde{f}(x) = x_1 + x_2 (with coefficient 0 for x3x_3) preserves the norm. Hahn-Banach guarantees such extensions exist, but they may not be unique.

πŸ“Problem 10: Uniform Boundedness Principle Application

Problem: Let Tn:β„“2β†’β„“2T_n: \ell^2 \to \ell^2 be defined by Tn(x1,x2,…)=(xn,0,0,…)T_n(x_1, x_2, \ldots) = (x_n, 0, 0, \ldots). Is {Tn}\{T_n\} pointwise bounded? Is it uniformly bounded?

Solution:

Step 1: Compute βˆ₯Tnxβˆ₯=∣xn∣\|T_n x\| = |x_n| for x=(x1,x2,…)βˆˆβ„“2x = (x_1, x_2, \ldots) \in \ell^2.

Step 2: For fixed xβˆˆβ„“2x \in \ell^2, βˆ‘βˆ£xn∣2<∞\sum |x_n|^2 < \infty, so ∣xnβˆ£β†’0|x_n| \to 0 as nβ†’βˆžn \to \infty. Therefore sup⁑nβˆ₯Tnxβˆ₯=sup⁑n∣xn∣<∞\sup_n \|T_n x\| = \sup_n |x_n| < \infty.

The family {Tn}\{T_n\} is pointwise bounded.

Step 3: By the Uniform Boundedness Principle (Banach-Steinhaus), since β„“2\ell^2 is a Banach space and {Tn}\{T_n\} is pointwise bounded, we have sup⁑nβˆ₯Tnβˆ₯<∞\sup_n \|T_n\| < \infty.

Step 4: Compute βˆ₯Tnβˆ₯=sup⁑βˆ₯xβˆ₯=1βˆ₯Tnxβˆ₯=sup⁑βˆ₯xβˆ₯=1∣xn∣=1\|T_n\| = \sup_{\|x\|=1} \|T_n x\| = \sup_{\|x\|=1} |x_n| = 1 (achieved by x=enx = e_n).

So sup⁑nβˆ₯Tnβˆ₯=1<∞\sup_n \|T_n\| = 1 < \infty βœ“

Result: The family is both pointwise bounded and uniformly bounded, with βˆ₯Tnβˆ₯=1\|T_n\| = 1 for all nn. The UBP guarantees this: pointwise boundedness on a Banach space implies uniform boundedness.

πŸ“Problem 11: Dual Space of lΒΉ

Problem: Show that the dual space of β„“1\ell^1 is isometrically isomorphic to β„“βˆž\ell^\infty.

Solution:

Step 1: Define the map Ξ¦:β„“βˆžβ†’(β„“1)βˆ—\Phi: \ell^\infty \to (\ell^1)^* by Ξ¦(y)(x)=βˆ‘n=1∞xnyn\Phi(y)(x) = \sum_{n=1}^{\infty} x_n y_n for xβˆˆβ„“1x \in \ell^1, yβˆˆβ„“βˆžy \in \ell^\infty.

Step 2: Check well-definedness: βˆ£βˆ‘xnynβˆ£β‰€βˆ₯yβˆ₯βˆžβˆ‘βˆ£xn∣=βˆ₯yβˆ₯∞βˆ₯xβˆ₯1<∞|\sum x_n y_n| \leq \|y\|_\infty \sum |x_n| = \|y\|_\infty \|x\|_1 < \infty.

Step 3: Ξ¦\Phi is linear and bounded: βˆ₯Ξ¦(y)βˆ₯=sup⁑βˆ₯xβˆ₯=1βˆ£βˆ‘xnynβˆ£β‰€βˆ₯yβˆ₯∞\|\Phi(y)\| = \sup_{\|x\|=1} |\sum x_n y_n| \leq \|y\|_\infty.

Step 4: For the reverse, given f∈(β„“1)βˆ—f \in (\ell^1)^*, define yn=f(en)y_n = f(e_n) where ene_n is the nn-th standard basis vector. Then y=(yn)βˆˆβ„“βˆžy = (y_n) \in \ell^\infty with βˆ₯yβˆ₯βˆžβ‰€βˆ₯fβˆ₯\|y\|_\infty \leq \|f\|, and f(x)=βˆ‘xnyn=Ξ¦(y)(x)f(x) = \sum x_n y_n = \Phi(y)(x).

Step 5: Isometry: βˆ₯Ξ¦(y)βˆ₯=βˆ₯yβˆ₯∞\|\Phi(y)\| = \|y\|_\infty (achieved by x=enx = e_n where ∣yn∣|y_n| is maximized).

Result: (β„“1)βˆ—β‰…β„“βˆž(\ell^1)^* \cong \ell^\infty isometrically. This is a fundamental duality result. Note that the dual of β„“βˆž\ell^\infty is NOT β„“1\ell^1 (it is β„“1\ell^1's bidual, which is larger).


Common Mistakes

MistakeCorrect Approach
Assuming all normed spaces are completeQ\mathbb{Q} with absolute value is normed but not complete; completion gives R\mathbb{R}
Confusing bounded and unbounded operatorsDifferentiation is unbounded; integration is bounded
Assuming the kernel trick requires explicit Ο•\phiThe kernel computes βŸ¨Ο•(x),Ο•(y)⟩\langle \phi(x), \phi(y) \rangle without ever computing Ο•\phi
Forgetting that RKHS elements are functionsAn RKHS is a space of functions, not vectors in Rn\mathbb{R}^n
Using the wrong inner product for L2L^2L2L^2 uses ⟨f,g⟩=∫fgˉ dΞΌ\langle f, g \rangle = \int f\bar{g} \, d\mu, not pointwise product
Confusing compact and bounded operatorsAll compact operators are bounded, but not conversely (e.g., identity on infinite-dim space is bounded but not compact)
Ignoring the regularization parameter Ξ»\lambdaWithout regularization (Ξ»=0\lambda = 0), the problem may be ill-posed if the kernel matrix is singular

Connections to Machine Learning

ℹ️ Connections to Machine Learning

Functional analysis provides the theoretical backbone for several ML paradigms: (1) Kernel methods (SVMs, kernel PCA, Gaussian processes) operate in RKHS, where the representer theorem guarantees solutions are finite linear combinations of kernel evaluations. (2) Regularization theory interprets weight decay as RKHS norm minimization, with the norm controlling function complexity. (3) Neural tangent kernels view infinite-width neural networks as kernel machines in a specific RKHS. (4) Functional PCA uses spectral decomposition of covariance operators for dimensionality reduction in function spaces. (5) Optimal transport spaces (Wasserstein distances) are studied as metric spaces with functional-analytic structure. (6) Reinforcement learning value functions live in Banach spaces, and convergence of temporal difference learning follows from properties of contractions on Banach spaces.


Exam/Interview Questions

Q1: State and explain the Cauchy-Schwarz inequality. Why is it important?

Answer: ∣⟨x,yβŸ©βˆ£β‰€βˆ₯xβˆ₯β‹…βˆ₯yβˆ₯|\langle x, y \rangle| \leq \|x\| \cdot \|y\|, with equality iff xx and yy are linearly dependent. It provides an upper bound on the inner product in terms of norms, guarantees that the angle between vectors is well-defined (cos⁑θ=⟨x,y⟩/(βˆ₯xβˆ₯βˆ₯yβˆ₯)\cos\theta = \langle x,y\rangle/(\|x\|\|y\|)), and is used to prove the triangle inequality for norms induced by inner products.


Q2: What is the difference between a Banach space and a Hilbert space?

Answer: A Banach space has only a norm (distance structure), while a Hilbert space has an inner product (angle structure). Every Hilbert space is a Banach space, but not conversely. The key difference: Hilbert spaces satisfy the parallelogram law βˆ₯x+yβˆ₯2+βˆ₯xβˆ’yβˆ₯2=2(βˆ₯xβˆ₯2+βˆ₯yβˆ₯2)\|x+y\|^2 + \|x-y\|^2 = 2(\|x\|^2+\|y\|^2), which fails for general Banach spaces (e.g., β„“1\ell^1, β„“βˆž\ell^\infty). The inner product enables orthogonal projections, which are fundamental to many ML algorithms.


Q3: Explain the representer theorem and its significance for kernel methods.

Answer: Any minimizer of βˆ‘L(yi,f(xi))+Ξ»βˆ₯fβˆ₯H2\sum L(y_i, f(x_i)) + \lambda\|f\|_{\mathcal{H}}^2 in an RKHS has the form fβˆ—=βˆ‘Ξ±iK(xi,β‹…)f^* = \sum \alpha_i K(x_i, \cdot). This reduces infinite-dimensional optimization to finding nn coefficients Ξ±i\alpha_i, where nn is the number of training points. It explains why kernel methods work: despite operating in potentially infinite-dimensional spaces, the solution is always a finite combination of kernel evaluations at training points.


Q4: Give an example of an unbounded operator and explain why boundedness matters.

Answer: The differentiation operator D:C1[0,1]β†’C[0,1]D: C^1[0,1] \to C[0,1] defined by Df=fβ€²Df = f' is unbounded: βˆ₯D(sin⁑(2Ο€nx))βˆ₯∞=2Ο€n\|D(\sin(2\pi nx))\|_\infty = 2\pi n while βˆ₯sin⁑(2Ο€nx)βˆ₯∞=1\|\sin(2\pi nx)\|_\infty = 1. Boundedness matters because bounded operators are continuous (preserve limits), and the spectral theorem applies to bounded self-adjoint operators. Unbounded operators require careful domain restrictions and are studied in quantum mechanics.


Q5: Why is L2L^2 a Hilbert space but L1L^1 is not?

Answer: L2L^2 has an inner product ⟨f,g⟩=∫fgΛ‰\langle f, g \rangle = \int f\bar{g} that induces the L2L^2 norm, and L2L^2 is complete. L1L^1 has a norm βˆ₯fβˆ₯1=∫∣f∣\|f\|_1 = \int |f| but no inner product that induces this norm (it fails the parallelogram law). While L1L^1 is a Banach space (complete), it is not a Hilbert space. This is why Fourier analysis works naturally in L2L^2 (orthonormal bases exist) but not in L1L^1.


Q6: How does the spectral theorem relate to PCA?

Answer: PCA finds the eigenvectors of the covariance matrix Ξ£=1nXTX\Sigma = \frac{1}{n}X^TX. In the infinite-dimensional setting, the covariance operator T:L2β†’L2T: L^2 \to L^2 defined by (Tf)(x)=∫K(x,y)f(y)dy(Tf)(x) = \int K(x,y)f(y)dy is a compact self-adjoint operator. The spectral theorem guarantees an orthonormal eigenbasis {en}\{e_n\} with eigenvalues Ξ»nβ†’0\lambda_n \to 0. The top eigenfunctions capture the directions of maximum variance, generalizing PCA to function spaces (functional PCA).


Quick Reference

ConceptDefinition/FormulaKey Property
Normβˆ₯xβˆ₯β‰₯0\|x\| \geq 0, βˆ₯Ξ±xβˆ₯=βˆ₯Ξ±βˆ₯βˆ₯xβˆ₯\|\alpha x\| = \|\alpha\|\|x\|, βˆ₯x+yβˆ₯≀βˆ₯xβˆ₯+βˆ₯yβˆ₯\|x+y\| \leq \|x\|+\|y\|Distance from origin
Inner Product⟨x,y⟩\langle x, y \rangle with conjugate symmetry, linearityEnables geometry (angles)
Cauchy-Schwarz∣⟨x,yβŸ©βˆ£β‰€βˆ₯xβˆ₯βˆ₯yβˆ₯|\langle x, y \rangle| \leq \|x\|\|y\|Bound on inner product
Hilbert SpaceComplete inner product spaceOrthonormal bases exist
Banach SpaceComplete normed spaceConvergence guaranteed
Bounded Operatorβˆ₯Tβˆ₯=sup⁑βˆ₯xβˆ₯=1βˆ₯Txβˆ₯<∞\|T\| = \sup_{\|x\|=1}\|Tx\| < \inftyContinuous
Adjoint⟨Tx,y⟩=⟨x,Tβˆ—y⟩\langle Tx, y \rangle = \langle x, T^*y \rangleGeneralizes transpose
RKHSHilbert space of functions with bounded point evaluationf(x)=⟨f,K(x,β‹…)⟩f(x) = \langle f, K(x,\cdot) \rangle
Kernel TrickK(x,y)=βŸ¨Ο•(x),Ο•(y)⟩K(x,y) = \langle \phi(x), \phi(y) \rangleImplicit high-dim computation
Representer Theoremfβˆ—=βˆ‘Ξ±iK(xi,β‹…)f^* = \sum \alpha_i K(x_i, \cdot)Infinite-dim β†’ finite-dim
Compact OperatorMaps bounded sets to precompact sets"Almost finite-rank"
Spectral TheoremT=βˆ‘Ξ»nβŸ¨β‹…,en⟩enT = \sum \lambda_n \langle \cdot, e_n \rangle e_nDiagonalizes compact self-adjoint operators

Cross-References

Lesson Progress98 / 100