Functional Analysis

ℹ️ Why It Matters

Functional analysis is the study of infinite-dimensional vector spaces equipped with additional structure (norms, inner products, topologies). It provides the mathematical foundation for kernel methods in machine learning, where data is mapped to high-dimensional (possibly infinite-dimensional) feature spaces. The Reproducing Kernel Hilbert Space (RKHS) framework unifies SVMs, Gaussian processes, and many regularization schemes. Understanding functional analysis reveals why kernel tricks work, how operators act on function spaces, and what guarantees exist for convergence of iterative methods in infinite dimensions.

Core Definitions

DfNormed Vector Space

A normed vector space is a pair $(V, \|\cdot\|)$ where $V$ is a vector space over $\mathbb{R}$ or $\mathbb{C}$ and $\|\cdot\|: V \to [0, \infty)$ is a norm satisfying:

$\|x\| = 0 \iff x = 0$ (definiteness)
$\|\alpha x\| = |\alpha|\|x\|$ for scalar $\alpha$ (absolute homogeneity)
$\|x + y\| \leq \|x\| + \|y\|$ (triangle inequality)

Examples include $\ell^p$ spaces with the $p$ -norm, $C[a,b]$ with the supremum norm, and $L^p$ spaces with the Lebesgue $p$ -norm.

DfBanach Space

A Banach space is a normed vector space $(V, \|\cdot\|)$ that is complete: every Cauchy sequence $\{x_n\}$ in $V$ converges to a limit $x \in V$ . Completeness ensures that infinite processes (limits, series, iterative algorithms) produce well-defined results within the space. The spaces $\ell^p$ ( $1 \leq p \leq \infty$ ) and $L^p$ ( $1 \leq p \leq \infty$ ) are Banach spaces.

DfInner Product Space

An inner product space is a vector space $V$ equipped with an inner product $\langle \cdot, \cdot \rangle: V \times V \to \mathbb{C}$ (or $\mathbb{R}$ ) satisfying:

$\langle x, x \rangle \geq 0$ with equality iff $x = 0$ (positive definiteness)
$\langle x, y \rangle = \overline{\langle y, x \rangle}$ (conjugate symmetry)
$\langle \alpha x + \beta y, z \rangle = \alpha\langle x, z \rangle + \beta\langle y, z \rangle$ (linearity in first argument)

The inner product induces a norm via $\|x\| = \sqrt{\langle x, x \rangle}$ .

DfHilbert Space

A Hilbert space is a complete inner product space. The norm is induced by the inner product: $\|x\| = \sqrt{\langle x, x \rangle}$ . Every Hilbert space has an orthonormal basis (possibly uncountable), and elements can be represented as generalized Fourier series in this basis. The key examples are $\ell^2$ (square-summable sequences) and $L^2$ (square-integrable functions).

DfLinear Operator

A linear operator $T: V \to W$ between vector spaces satisfies $T(\alpha x + \beta y) = \alpha T(x) + \beta T(y)$ . When $V$ and $W$ are normed spaces, $T$ is bounded if $\|T\| = \sup_{\|x\|=1} \|Tx\| < \infty$ . Bounded linear operators are continuous. The set of bounded linear operators from $X$ to $Y$ is denoted $\mathcal{B}(X, Y)$ .

DfAdjoint Operator

For a bounded linear operator $T: H \to H$ on a Hilbert space, the adjoint $T^*: H \to H$ is the unique operator satisfying $\langle Tx, y \rangle = \langle x, T^*y \rangle$ for all $x, y \in H$ . If $T = T^*$ , the operator is self-adjoint (Hermitian). Self-adjoint operators have real eigenvalues and orthogonal eigenspaces. The adjoint generalizes the matrix transpose to infinite dimensions.

DfSpectrum

The spectrum $\sigma(T)$ of a bounded linear operator $T$ is the set of $\lambda \in \mathbb{C}$ such that $T - \lambda I$ is not invertible. For self-adjoint operators on a Hilbert space, the spectrum is real and consists of three parts: point spectrum (eigenvalues), continuous spectrum, and residual spectrum. The spectral theorem describes the structure of self-adjoint operators via their spectra.

DfReproducing Kernel Hilbert Space (RKHS)

An RKHS $\mathcal{H}$ on a set $X$ is a Hilbert space of functions $f: X \to \mathbb{R}$ such that for every $x \in X$ , the point evaluation functional $f \mapsto f(x)$ is bounded. The Riesz representation theorem guarantees a unique kernel $K: X \times X \to \mathbb{R}$ such that $f(x) = \langle f, K(x, \cdot) \rangle_{\mathcal{H}}$ for all $f \in \mathcal{H}$ and $x \in X$ . The kernel is positive definite and reproduces function values via inner products.

DfCompact Operator

A linear operator $T: X \to Y$ between Banach spaces is compact if it maps bounded sets to precompact sets (sets whose closure is compact). Equivalently, for every bounded sequence $\{x_n\}$ , $\{Tx_n\}$ has a convergent subsequence. Compact operators are "almost finite-rank" and have well-behaved spectra (eigenvalues accumulate only at 0). The identity operator on an infinite-dimensional space is bounded but NOT compact.

Key Formulas

Cauchy-Schwarz Inequality

|\langle x, y \rangle| \leq \|x\| \cdot \|y\|

Here,

$x, y$ =Elements of an inner product space
$langle x, y angle$ =Inner product of x and y
$|x|$ =Norm induced by the inner product: |x| = sqrt(<x,x>)

Parallelogram Law

\|x + y\|^2 + \|x - y\|^2 = 2(\|x\|^2 + \|y\|^2)

Here,

$x, y$ =Elements of an inner product space

Orthogonal Projection

\text{proj}_U(x) = \sum_{i=1}^{n} \langle x, e_i \rangle e_i

Here,

$U$ =Closed subspace with orthonormal basis {e_i}
$x$ =Element to project
$e_i$ =Orthonormal basis vectors of U

Riesz Representation Theorem

f(x) = \langle f, K(x, \cdot) \rangle_{\mathcal{H}}

Here,

$f$ =Function in the RKHS
$K(x, cdot)$ =Reproducing kernel evaluated at x, viewed as a function in H
$mathcal{H}$ =Reproducing Kernel Hilbert Space

Kernel Trick

K(x, y) = \langle \phi(x), \phi(y) \rangle_{\mathcal{H}}

Here,

$K(x, y)$ =Kernel function computing similarity between x and y
$phi(x)$ =Feature map from input space to Hilbert space
$mathcal{H}$ =Feature Hilbert space (possibly infinite-dimensional)

Spectral Decomposition (Compact Self-Adjoint)

T = \sum_{n=1}^{\infty} \lambda_n \langle \cdot, e_n \rangle e_n

Here,

$T$ =Compact self-adjoint operator on Hilbert space
$lambda_n$ =Eigenvalues (real, tending to zero)
$e_n$ =Orthonormal eigenvectors

Residual Sum of Squares with RKHS Penalty

\min_{f \in \mathcal{H}} \frac{1}{n}\sum_{i=1}^{n} L(y_i, f(x_i)) + \lambda\|f\|_{\mathcal{H}}^2

Here,

$L$ =Loss function
$f$ =Function in the RKHS
$lambda$ =Regularization parameter
$|f|_{mathcal{H}}^2$ =RKHS norm squared (complexity penalty)

Hölder's Inequality

\|fg\|_1 \leq \|f\|_p \|g\|_q, \quad \frac{1}{p} + \frac{1}{q} = 1

Here,

$f$ =Function in L^p
$g$ =Function in L^q
$p, q$ =Conjugate exponents: 1/p + 1/q = 1

Important Theorems

ThRiesz Representation Theorem (Hilbert Space)

Let $H$ be a Hilbert space and $f: H \to \mathbb{R}$ a bounded linear functional. Then there exists a unique $y \in H$ such that $f(x) = \langle x, y \rangle$ for all $x \in H$ , and $\|f\| = \|y\|$ . This theorem establishes that every bounded linear functional on a Hilbert space is represented by inner product with a fixed element. It is the foundation for the representer theorem in RKHS.

ThOpen Mapping Theorem

If $T: X \to Y$ is a surjective bounded linear operator between Banach spaces $X$ and $Y$ , then $T$ is an open map (it maps open sets to open sets). This implies that if $T$ is bijective, its inverse $T^{-1}$ is also bounded. The open mapping theorem is one of the three fundamental theorems of functional analysis (along with Hahn-Banach and uniform boundedness).

ThHahn-Banach Theorem

Let $X$ be a normed space, $M \subseteq X$ a subspace, and $f: M \to \mathbb{R}$ a bounded linear functional. Then $f$ extends to a bounded linear functional $\tilde{f}: X \to \mathbb{R}$ with $\|\tilde{f}\| = \|f\|$ . This guarantees the existence of "enough" continuous linear functionals to separate points. It is used to prove the existence of support functionals, dual bases, and the Riesz representation theorem.

ThSpectral Theorem (Compact Self-Adjoint Operators)

Let $T: H \to H$ be a compact self-adjoint operator on a Hilbert space. Then there exists an orthonormal basis $\{e_n\}$ of eigenvectors with corresponding real eigenvalues $\lambda_n$ such that $Te_n = \lambda_n e_n$ . Moreover, $T$ has the representation $Tx = \sum_n \lambda_n \langle x, e_n \rangle e_n$ , where $\lambda_n \to 0$ if $H$ is infinite-dimensional. This generalizes the finite-dimensional spectral decomposition to compact operators.

ThRepresenter Theorem

Let $\mathcal{H}$ be a RKHS, $K$ its reproducing kernel, and $S = \{(x_i, y_i)\}_{i=1}^n$ a training set. For any loss function $L$ and regularization parameter $\lambda > 0$ , any minimizer of $\frac{1}{n}\sum_{i=1}^n L(y_i, f(x_i)) + \lambda\|f\|_{\mathcal{H}}^2$ has the form $f^*(\cdot) = \sum_{i=1}^n \alpha_i K(x_i, \cdot)$ . This reduces infinite-dimensional optimization to a finite-dimensional problem in the coefficients $\alpha_i$ .

ThUniform Boundedness Principle (Banach-Steinhaus)

Let $X$ be a Banach space, $Y$ a normed space, and $\{T_\alpha\}_{\alpha \in A}$ a family of bounded linear operators from $X$ to $Y$ . If $\sup_\alpha \|T_\alpha x\| < \infty$ for each $x \in X$ (pointwise bounded), then $\sup_\alpha \|T_\alpha\| < \infty$ (uniformly bounded). This theorem is essential for proving convergence of Fourier series and for the spectral theory of unbounded operators.

Worked Examples

📝Verifying Cauchy-Schwarz Inequality

Problem: Verify the Cauchy-Schwarz inequality for $\mathbf{x} = (1, 2, 3)$ and $\mathbf{y} = (4, 5, 6)$ in $\mathbb{R}^3$ .

Solution:

Step 1: Compute the inner product.

\langle \mathbf{x}, \mathbf{y} \rangle = 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 4 + 10 + 18 = 32

Step 2: Compute the norms.

\|\mathbf{x}\| = \sqrt{1^2 + 2^2 + 3^2} = \sqrt{1 + 4 + 9} = \sqrt{14} \approx 3.742

\|\mathbf{y}\| = \sqrt{4^2 + 5^2 + 6^2} = \sqrt{16 + 25 + 36} = \sqrt{77} \approx 8.775

Step 3: Verify the inequality.

|\langle \mathbf{x}, \mathbf{y} \rangle| = |32| = 32

\|\mathbf{x}\| \cdot \|\mathbf{y}\| = \sqrt{14} \cdot \sqrt{77} = \sqrt{1078} \approx 32.833

Step 4: Check: $32 \leq 32.833$ ✓

Equality condition: Cauchy-Schwarz holds with equality iff $\mathbf{x}$ and $\mathbf{y}$ are linearly dependent. Here $32 < 32.833$ (strict inequality), so $\mathbf{x}$ and $\mathbf{y}$ are linearly independent.

📝Orthogonal Projection in L²[0,1]

Problem: Project $f(x) = x$ onto the subspace spanned by $\{1, x^2\}$ in $L^2[0,1]$ with inner product $\langle f, g \rangle = \int_0^1 f(x)g(x)dx$ .

Solution:

Step 1: First orthogonalize the basis using Gram-Schmidt.

Let $e_0 = 1$ (already normalized: $\|1\|^2 = \int_0^1 1 \, dx = 1$ ).

Let $v_1 = x^2 - \langle x^2, e_0 \rangle e_0 = x^2 - \int_0^1 x^2 dx \cdot 1 = x^2 - 1/3$ .

Normalize: $\|v_1\|^2 = \int_0^1 (x^2 - 1/3)^2 dx = \int_0^1 (x^4 - 2x^2/3 + 1/9)dx = 1/5 - 2/9 + 1/9 = 1/5 - 1/9 = 4/45$ .

e_1 = \frac{x^2 - 1/3}{\sqrt{4/45}} = \frac{\sqrt{45}}{2}(x^2 - 1/3)

Step 2: Project $f(x) = x$ onto $\text{span}\{e_0, e_1\}$ :

\text{proj}(x) = \langle x, e_0 \rangle e_0 + \langle x, e_1 \rangle e_1

\langle x, 1 \rangle = \int_0^1 x \, dx = 1/2

\langle x, x^2 - 1/3 \rangle = \int_0^1 x(x^2 - 1/3)dx = \int_0^1 (x^3 - x/3)dx = 1/4 - 1/6 = 1/12

\text{proj}(x) = \frac{1}{2} \cdot 1 + \frac{1/12}{4/45}(x^2 - 1/3) = \frac{1}{2} + \frac{45}{48}(x^2 - 1/3) = \frac{1}{2} + \frac{15}{16}(x^2 - 1/3)

= \frac{1}{2} + \frac{15}{16}x^2 - \frac{15}{48} = \frac{15}{16}x^2 + \frac{1}{2} - \frac{5}{16} = \frac{15}{16}x^2 + \frac{3}{16}

Result: $\text{proj}(x) = \frac{15}{16}x^2 + \frac{3}{16}$

Verification: $\langle x - \text{proj}(x), 1 \rangle = \int_0^1 (x - 15x^2/16 - 3/16)dx = 1/2 - 15/32 - 3/16 = 0$ ✓

📝Kernel Trick Computation

Problem: Compute the kernel $K(x, y) = (x^T y + c)^d$ for $d = 2$ , $c = 1$ , $x = (x_1, x_2)$ and $y = (y_1, y_2)$ , and show it equals an inner product in a feature space.

Solution:

Step 1: Expand the kernel.

K(x, y) = (x_1 y_1 + x_2 y_2 + 1)^2

= (x_1 y_1)^2 + (x_2 y_2)^2 + 1 + 2x_1 y_1 x_2 y_2 + 2x_1 y_1 + 2x_2 y_2

Step 2: Define the feature map $\phi: \mathbb{R}^2 \to \mathbb{R}^6$ :

\phi(x) = (x_1^2, x_2^2, 1, \sqrt{2}x_1 x_2, \sqrt{2}x_1, \sqrt{2}x_2)

Step 3: Compute the inner product in feature space:

\langle \phi(x), \phi(y) \rangle = x_1^2 y_1^2 + x_2^2 y_2^2 + 1 + 2x_1 x_2 y_1 y_2 + 2x_1 y_1 + 2x_2 y_2 = K(x, y)

Result: The polynomial kernel of degree 2 with $c=1$ computes an inner product in a 6-dimensional feature space without explicitly computing $\phi(x)$ or $\phi(y)$ .

For general $d$ and input dimension $n$ , the explicit feature space has dimension $\binom{n+d}{d}$ , which grows exponentially. The kernel trick avoids this exponential cost.

📝Spectral Decomposition of a Compact Operator

Problem: Find the spectral decomposition of the integral operator $T: L^2[0, \pi] \to L^2[0, \pi]$ defined by $Tf(x) = \int_0^\pi \sin(x)\sin(y) f(y) dy$ .

Solution:

Step 1: Identify the operator. $T$ is a rank-1 operator since the kernel $K(x,y) = \sin(x)\sin(y)$ is separable.

Step 2: Write $Tf(x) = \sin(x) \int_0^\pi \sin(y) f(y) dy = \sin(x) \langle f, \sin \rangle$ .

Step 3: Find eigenvalues and eigenvectors.

If $f = \sin$ , then $T(\sin)(x) = \sin(x) \langle \sin, \sin \rangle = \sin(x) \cdot \frac{\pi}{2}$ .

So $T(\sin) = \frac{\pi}{2} \sin$ , meaning $\lambda_1 = \frac{\pi}{2}$ is an eigenvalue with eigenvector $e_1 = \sqrt{\frac{2}{\pi}} \sin(x)$ (normalized).

Step 4: For any $f$ orthogonal to $\sin$ (i.e., $\langle f, \sin \rangle = 0$ ):

Tf(x) = \sin(x) \cdot 0 = 0

So $Tf = 0$ for all $f \perp \sin$ . This means all other eigenvalues are $\lambda_n = 0$ for $n \geq 2$ .

Step 5: Spectral decomposition:

Tf = \frac{\pi}{2} \langle f, \sqrt{2/\pi}\sin \rangle \sqrt{2/\pi}\sin = \langle f, \sin \rangle \sin

The operator is compact (rank 1), self-adjoint, and has one non-zero eigenvalue $\lambda_1 = \pi/2$ .

Practice Problems

📝Problem 1: Completeness of l² Space

Problem: Show that $\ell^2 = \{(x_n)_{n=1}^\infty : \sum_{n=1}^\infty |x_n|^2 < \infty\}$ is a Hilbert space.

Solution:

Step 1: Define the inner product $\langle x, y \rangle = \sum_{n=1}^\infty x_n \overline{y_n}$ . This converges absolutely by Cauchy-Schwarz on partial sums.

Step 2: Verify inner product axioms:

$\langle x, x \rangle = \sum |x_n|^2 \geq 0$ with equality iff $x = 0$ ✓
$\langle x, y \rangle = \overline{\langle y, x \rangle}$ ✓
Linearity in first argument ✓

Step 3: Show completeness. Let $\{x^{(k)}\}$ be a Cauchy sequence in $\ell^2$ .

For each fixed $n$ , $|x_n^{(k)} - x_n^{(m)}| \leq \|x^{(k)} - x^{(m)}\|$ , so $\{x_n^{(k)}\}_{k=1}^\infty$ is Cauchy in $\mathbb{R}$ , hence converges to some $x_n$ .

Step 4: Show $x = (x_n) \in \ell^2$ and $x^{(k)} \to x$ .

Given $\epsilon > 0$ , choose $K$ such that $\|x^{(k)} - x^{(m)}\| < \epsilon$ for $k, m \geq K$ . For any finite $N$ :

\sum_{n=1}^N |x_n^{(k)} - x_n|^2 = \lim_{m \to \infty} \sum_{n=1}^N |x_n^{(k)} - x_n^{(m)}|^2 \leq \epsilon^2

Taking $N \to \infty$ : $\|x^{(k)} - x\| \leq \epsilon$ , so $x^{(k)} \to x$ in $\ell^2$ .

Result: $\ell^2$ is complete, hence a Hilbert space.

📝Problem 2: Boundedness of an Operator

Problem: Show that the differentiation operator $T: C[0,1] \to C[0,1]$ defined by $Tf = f'$ is not bounded with the supremum norm.

Solution:

Step 1: Consider the family of functions $f_n(x) = \sin(2\pi n x)$ .

Step 2: Compute norms.

\|f_n\|_\infty = \sup_{x \in [0,1]} |\sin(2\pi n x)| = 1

\|Tf_n\|_\infty = \|f_n'\|_\infty = \sup_{x \in [0,1]} |2\pi n \cos(2\pi n x)| = 2\pi n

Step 3: Compute the operator norm.

\|T\| = \sup_{\|f\|_\infty = 1} \|Tf\|_\infty \geq \frac{\|Tf_n\|_\infty}{\|f_n\|_\infty} = 2\pi n

Step 4: Since $2\pi n \to \infty$ as $n \to \infty$ , we have $\|T\| = \infty$ .

Result: The differentiation operator is unbounded on $C[0,1]$ with the supremum norm. This demonstrates that differentiation is a fundamentally different operation from integration (which is bounded). In quantum mechanics, the momentum operator $\hat{p} = -i\hbar \frac{d}{dx}$ is an unbounded self-adjoint operator.

📝Problem 3: RKHS Representer Theorem Application

Problem: Given data $\{(x_1, y_1), (x_2, y_2), (x_3, y_3)\} = \{(0, 1), (1, 3), (2, 5)\}$ , find the function $f^*$ that minimizes $\sum_{i=1}^3 (y_i - f(x_i))^2 + \lambda\|f\|_{\mathcal{H}}^2$ using the RBF kernel $K(x, y) = e^{-\gamma(x-y)^2}$ with $\gamma = 1$ and $\lambda = 0.1$ .

Solution:

Step 1: By the Representer Theorem, $f^*(\cdot) = \sum_{i=1}^3 \alpha_i K(x_i, \cdot)$ .

Step 2: Define the kernel matrix $K_{ij} = K(x_i, x_j) = e^{-(x_i - x_j)^2}$ :

K = \begin{pmatrix} 1 & e^{-1} & e^{-4} \\ e^{-1} & 1 & e^{-1} \\ e^{-4} & e^{-1} & 1 \end{pmatrix} \approx \begin{pmatrix} 1 & 0.368 & 0.018 \\ 0.368 & 1 & 0.368 \\ 0.018 & 0.368 & 1 \end{pmatrix}

Step 3: The optimal $\boldsymbol{\alpha}$ satisfies $(K + \lambda I)\boldsymbol{\alpha} = \mathbf{y}$ :

\begin{pmatrix} 1.1 & 0.368 & 0.018 \\ 0.368 & 1.1 & 0.368 \\ 0.018 & 0.368 & 1.1 \end{pmatrix} \boldsymbol{\alpha} = \begin{pmatrix} 1 \\ 3 \\ 5 \end{pmatrix}

Step 4: Solve the $3 \times 3$ system (e.g., via Gaussian elimination):

\boldsymbol{\alpha} \approx (0.24, 1.93, 3.63)

Step 5: The solution is $f^*(x) = 0.24 \cdot e^{-x^2} + 1.93 \cdot e^{-(x-1)^2} + 3.63 \cdot e^{-(x-2)^2}$ .

Result: The function interpolates the data while remaining smooth (controlled by $\lambda$ ). The representer theorem reduced an infinite-dimensional problem to solving a $3 \times 3$ linear system.

📝Problem 4: Orthonormal Basis in L²[-π, π]

Problem: Verify that the Fourier basis $\{e^{inx}/\sqrt{2\pi}\}_{n \in \mathbb{Z}}$ is orthonormal in $L^2[-\pi, \pi]$ .

Solution:

Step 1: The inner product in $L^2[-\pi, \pi]$ is $\langle f, g \rangle = \int_{-\pi}^{\pi} f(x)\overline{g(x)} dx$ .

Step 2: Compute $\langle e^{inx}, e^{imx} \rangle$ :

\int_{-\pi}^{\pi} e^{inx} \overline{e^{imx}} dx = \int_{-\pi}^{\pi} e^{i(n-m)x} dx

Step 3: Evaluate the integral.

If $n = m$ : $\int_{-\pi}^{\pi} e^{0} dx = 2\pi$

If $n \neq m$ : $\int_{-\pi}^{\pi} e^{i(n-m)x} dx = \left[\frac{e^{i(n-m)x}}{i(n-m)}\right]_{-\pi}^{\pi} = \frac{e^{i(n-m)\pi} - e^{-i(n-m)\pi}}{i(n-m)} = \frac{2i\sin((n-m)\pi)}{i(n-m)} = 0$

Step 4: Therefore:

\langle \frac{e^{inx}}{\sqrt{2\pi}}, \frac{e^{imx}}{\sqrt{2\pi}} \rangle = \frac{1}{2\pi}\langle e^{inx}, e^{imx} \rangle = \begin{cases} 1 & \text{if } n = m \\ 0 & \text{if } n \neq m \end{cases}

Result: The Fourier basis is orthonormal. By completeness, any $f \in L^2[-\pi, \pi]$ can be written as $f(x) = \sum_{n \in \mathbb{Z}} \hat{f}(n) e^{inx}$ with $\hat{f}(n) = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x) e^{-inx} dx$ .

📝Problem 5: Bounding Generalization Error via RKHS Norm

Problem: For a function $f^*$ in an RKHS $\mathcal{H}$ with $\|f^*\|_{\mathcal{H}} \leq B$ , derive the generalization bound using the representer theorem.

Solution:

Step 1: By the representer theorem, $f^*(\cdot) = \sum_{i=1}^n \alpha_i K(x_i, \cdot)$ .

Step 2: The RKHS norm constraint gives:

\|f^*\|_{\mathcal{H}}^2 = \sum_{i,j} \alpha_i \alpha_j K(x_i, x_j) = \boldsymbol{\alpha}^T K \boldsymbol{\alpha} \leq B^2

Step 3: For the squared loss, the representer theorem solution gives:

\boldsymbol{\alpha} = (K + \lambda I)^{-1} \mathbf{y}

Step 4: The generalization error (expected risk minus empirical risk) can be bounded using the covering number of the RKHS ball:

\text{Rademacher complexity} \leq \frac{B\sqrt{\text{tr}(K)/n}}{\lambda + \kappa}

where $\kappa$ is the minimum eigenvalue of $K$ .

Step 5: By standard generalization theory:

R(f^*) \leq \hat{R}_n(f^*) + O\left(\frac{B\sqrt{\kappa_n}}{n(\lambda + \kappa_n)}\right)

where $\kappa_n$ captures the effective dimension.

Result: The generalization bound depends on $\|f^*\|_{\mathcal{H}}$ (model complexity), $\lambda$ (regularization), and $n$ (sample size). Larger RKHS norm or smaller $\lambda$ increases overfitting risk.

📝Problem 6: Bessel's Inequality

Problem: Prove Bessel's inequality for an orthonormal sequence $\{e_n\}$ in a Hilbert space $H$ : $\sum_{n=1}^{\infty} |\langle x, e_n \rangle|^2 \leq \|x\|^2$ .

Solution:

Step 1: Let $S_N = \sum_{n=1}^{N} \langle x, e_n \rangle e_n$ be the partial projection onto $\text{span}\{e_1, \ldots, e_N\}$ .

Step 2: Compute the norm of the residual $x - S_N$ :

\|x - S_N\|^2 = \|x\|^2 - 2\text{Re}\langle x, S_N \rangle + \|S_N\|^2

Step 3: Since $\{e_n\}$ is orthonormal:

\langle x, S_N \rangle = \sum_{n=1}^{N} \overline{\langle x, e_n \rangle} \langle x, e_n \rangle = \sum_{n=1}^{N} |\langle x, e_n \rangle|^2

\|S_N\|^2 = \sum_{n=1}^{N} |\langle x, e_n \rangle|^2

Step 4: Therefore:

\|x - S_N\|^2 = \|x\|^2 - \sum_{n=1}^{N} |\langle x, e_n \rangle|^2 \geq 0

Step 5: Since this holds for all $N$ :

\sum_{n=1}^{\infty} |\langle x, e_n \rangle|^2 \leq \|x\|^2

Result: Bessel's inequality follows from the non-negativity of the norm. Equality holds iff $\{e_n\}$ is a complete orthonormal basis (Parseval's identity).

📝Problem 7: Riesz Representation in L²

Problem: Find the Riesz representative of the linear functional $f(x) = \int_0^1 x(t) \cdot t \, dt$ on $L^2[0,1]$ .

Solution:

Step 1: By the Riesz representation theorem, there exists $g \in L^2[0,1]$ such that $f(x) = \langle x, g \rangle = \int_0^1 x(t) g(t) \, dt$ for all $x \in L^2[0,1]$ .

Step 2: Comparing $f(x) = \int_0^1 x(t) \cdot t \, dt$ with $\langle x, g \rangle = \int_0^1 x(t) g(t) \, dt$ :

We identify $g(t) = t$ .

Step 3: Verify: $\|g\|^2 = \int_0^1 t^2 \, dt = 1/3$ .

Step 4: The operator norm of $f$ is $\|f\| = \|g\| = \sqrt{1/3} = 1/\sqrt{3}$ .

Result: The Riesz representative is $g(t) = t$ . The functional $f$ measures the "first moment" of $x$ , and its Riesz representative is the function $g(t) = t$ against which $x$ is integrated.

📝Problem 8: Compactness of Integral Operators

Problem: Show that the integral operator $T: L^2[0,1] \to L^2[0,1]$ defined by $Tf(x) = \int_0^1 K(x,y)f(y)dy$ with continuous kernel $K$ is compact.

Solution:

Step 1: Let $\{f_n\}$ be a bounded sequence in $L^2[0,1]$ with $\|f_n\|_2 \leq M$ .

Step 2: The functions $g_n(x) = Tf_n(x) = \int_0^1 K(x,y)f_n(y)dy$ satisfy:

|g_n(x)| \leq \left(\int_0^1 |K(x,y)|^2 dy\right)^{1/2} \|f_n\|_2 \leq M\|K\|_{L^2([0,1]^2)}

So $\{g_n\}$ is uniformly bounded.

Step 3: Since $K$ is continuous on the compact set $[0,1]^2$ , it is uniformly continuous. For $|x_1 - x_2| < \delta$ :

|g_n(x_1) - g_n(x_2)| \leq \int_0^1 |K(x_1,y) - K(x_2,y)| |f_n(y)| dy

\leq \left(\int_0^1 |K(x_1,y) - K(x_2,y)|^2 dy\right)^{1/2} M \to 0

So $\{g_n\}$ is equicontinuous.

Step 4: By the Arzelà-Ascoli theorem, every subsequence of $\{g_n\}$ has a uniformly convergent subsequence, hence a convergent subsequence in $L^2[0,1]$ .

Result: $T$ maps bounded sets to precompact sets, so $T$ is compact. This is a fundamental result: integral operators with continuous kernels are compact, and their spectral theory is well-understood via the Mercer theorem.

📝Problem 9: Hahn-Banach Extension

Problem: Extend the linear functional $f(x) = x_1 + x_2$ defined on the subspace $M = \{(x_1, x_2, 0) : x_1, x_2 \in \mathbb{R}\} \subset \mathbb{R}^3$ (with $\ell^2$ norm) to all of $\mathbb{R}^3$ with the same norm.

Solution:

Step 1: The norm of $f$ on $M$ : For $\|x\| = 1$ with $x \in M$ :

|f(x)| = |x_1 + x_2| \leq \sqrt{2}\sqrt{x_1^2 + x_2^2} = \sqrt{2}

Equality when $x_1 = x_2 = 1/\sqrt{2}$ , so $\|f\| = \sqrt{2}$ .

Step 2: By Hahn-Banach, extend $f$ to $\tilde{f}$ on $\mathbb{R}^3$ with $\|\tilde{f}\| = \sqrt{2}$ .

Step 3: One extension: $\tilde{f}(x) = x_1 + x_2 + 0 \cdot x_3 = x_1 + x_2$ (just ignore $x_3$ ).

Check norm: $|\tilde{f}(x)| = |x_1 + x_2| \leq \sqrt{2}\|x\|$ ✓

Step 4: Another extension: $\tilde{f}(x) = x_1 + x_2 + \alpha x_3$ where $|\alpha| \leq 1$ .

For $\alpha = 1$ : $\tilde{f}(x) = x_1 + x_2 + x_3$ . Norm: $\|\tilde{f}\| = \sqrt{3} > \sqrt{2}$ ✗

For $\alpha = 0$ : $\|\tilde{f}\| = \sqrt{2}$ ✓

Result: The extension $\tilde{f}(x) = x_1 + x_2$ (with coefficient 0 for $x_3$ ) preserves the norm. Hahn-Banach guarantees such extensions exist, but they may not be unique.

📝Problem 10: Uniform Boundedness Principle Application

Problem: Let $T_n: \ell^2 \to \ell^2$ be defined by $T_n(x_1, x_2, \ldots) = (x_n, 0, 0, \ldots)$ . Is $\{T_n\}$ pointwise bounded? Is it uniformly bounded?

Solution:

Step 1: Compute $\|T_n x\| = |x_n|$ for $x = (x_1, x_2, \ldots) \in \ell^2$ .

Step 2: For fixed $x \in \ell^2$ , $\sum |x_n|^2 < \infty$ , so $|x_n| \to 0$ as $n \to \infty$ . Therefore $\sup_n \|T_n x\| = \sup_n |x_n| < \infty$ .

The family $\{T_n\}$ is pointwise bounded.

Step 3: By the Uniform Boundedness Principle (Banach-Steinhaus), since $\ell^2$ is a Banach space and $\{T_n\}$ is pointwise bounded, we have $\sup_n \|T_n\| < \infty$ .

Step 4: Compute $\|T_n\| = \sup_{\|x\|=1} \|T_n x\| = \sup_{\|x\|=1} |x_n| = 1$ (achieved by $x = e_n$ ).

So $\sup_n \|T_n\| = 1 < \infty$ ✓

Result: The family is both pointwise bounded and uniformly bounded, with $\|T_n\| = 1$ for all $n$ . The UBP guarantees this: pointwise boundedness on a Banach space implies uniform boundedness.

📝Problem 11: Dual Space of l¹

Problem: Show that the dual space of $\ell^1$ is isometrically isomorphic to $\ell^\infty$ .

Solution:

Step 1: Define the map $\Phi: \ell^\infty \to (\ell^1)^*$ by $\Phi(y)(x) = \sum_{n=1}^{\infty} x_n y_n$ for $x \in \ell^1$ , $y \in \ell^\infty$ .

Step 2: Check well-definedness: $|\sum x_n y_n| \leq \|y\|_\infty \sum |x_n| = \|y\|_\infty \|x\|_1 < \infty$ .

Step 3: $\Phi$ is linear and bounded: $\|\Phi(y)\| = \sup_{\|x\|=1} |\sum x_n y_n| \leq \|y\|_\infty$ .

Step 4: For the reverse, given $f \in (\ell^1)^*$ , define $y_n = f(e_n)$ where $e_n$ is the $n$ -th standard basis vector. Then $y = (y_n) \in \ell^\infty$ with $\|y\|_\infty \leq \|f\|$ , and $f(x) = \sum x_n y_n = \Phi(y)(x)$ .

Step 5: Isometry: $\|\Phi(y)\| = \|y\|_\infty$ (achieved by $x = e_n$ where $|y_n|$ is maximized).

Result: $(\ell^1)^* \cong \ell^\infty$ isometrically. This is a fundamental duality result. Note that the dual of $\ell^\infty$ is NOT $\ell^1$ (it is $\ell^1$ 's bidual, which is larger).

Common Mistakes

Mistake	Correct Approach
Assuming all normed spaces are complete	$\mathbb{Q}$ with absolute value is normed but not complete; completion gives $\mathbb{R}$
Confusing bounded and unbounded operators	Differentiation is unbounded; integration is bounded
Assuming the kernel trick requires explicit $\phi$	The kernel computes $\langle \phi(x), \phi(y) \rangle$ without ever computing $\phi$
Forgetting that RKHS elements are functions	An RKHS is a space of functions, not vectors in $\mathbb{R}^n$
Using the wrong inner product for $L^2$	$L^2$ uses $\langle f, g \rangle = \int f\bar{g} \, d\mu$ , not pointwise product
Confusing compact and bounded operators	All compact operators are bounded, but not conversely (e.g., identity on infinite-dim space is bounded but not compact)
Ignoring the regularization parameter $\lambda$	Without regularization ( $\lambda = 0$ ), the problem may be ill-posed if the kernel matrix is singular

Connections to Machine Learning

ℹ️ Connections to Machine Learning

Functional analysis provides the theoretical backbone for several ML paradigms: (1) Kernel methods (SVMs, kernel PCA, Gaussian processes) operate in RKHS, where the representer theorem guarantees solutions are finite linear combinations of kernel evaluations. (2) Regularization theory interprets weight decay as RKHS norm minimization, with the norm controlling function complexity. (3) Neural tangent kernels view infinite-width neural networks as kernel machines in a specific RKHS. (4) Functional PCA uses spectral decomposition of covariance operators for dimensionality reduction in function spaces. (5) Optimal transport spaces (Wasserstein distances) are studied as metric spaces with functional-analytic structure. (6) Reinforcement learning value functions live in Banach spaces, and convergence of temporal difference learning follows from properties of contractions on Banach spaces.

Exam/Interview Questions

Q1: State and explain the Cauchy-Schwarz inequality. Why is it important?

Answer: $|\langle x, y \rangle| \leq \|x\| \cdot \|y\|$ , with equality iff $x$ and $y$ are linearly dependent. It provides an upper bound on the inner product in terms of norms, guarantees that the angle between vectors is well-defined ( $\cos\theta = \langle x,y\rangle/(\|x\|\|y\|)$ ), and is used to prove the triangle inequality for norms induced by inner products.

Q2: What is the difference between a Banach space and a Hilbert space?

Answer: A Banach space has only a norm (distance structure), while a Hilbert space has an inner product (angle structure). Every Hilbert space is a Banach space, but not conversely. The key difference: Hilbert spaces satisfy the parallelogram law $\|x+y\|^2 + \|x-y\|^2 = 2(\|x\|^2+\|y\|^2)$ , which fails for general Banach spaces (e.g., $\ell^1$ , $\ell^\infty$ ). The inner product enables orthogonal projections, which are fundamental to many ML algorithms.

Q3: Explain the representer theorem and its significance for kernel methods.

Answer: Any minimizer of $\sum L(y_i, f(x_i)) + \lambda\|f\|_{\mathcal{H}}^2$ in an RKHS has the form $f^* = \sum \alpha_i K(x_i, \cdot)$ . This reduces infinite-dimensional optimization to finding $n$ coefficients $\alpha_i$ , where $n$ is the number of training points. It explains why kernel methods work: despite operating in potentially infinite-dimensional spaces, the solution is always a finite combination of kernel evaluations at training points.

Q4: Give an example of an unbounded operator and explain why boundedness matters.

Answer: The differentiation operator $D: C^1[0,1] \to C[0,1]$ defined by $Df = f'$ is unbounded: $\|D(\sin(2\pi nx))\|_\infty = 2\pi n$ while $\|\sin(2\pi nx)\|_\infty = 1$ . Boundedness matters because bounded operators are continuous (preserve limits), and the spectral theorem applies to bounded self-adjoint operators. Unbounded operators require careful domain restrictions and are studied in quantum mechanics.

Q5: Why is $L^2$ a Hilbert space but $L^1$ is not?

Answer: $L^2$ has an inner product $\langle f, g \rangle = \int f\bar{g}$ that induces the $L^2$ norm, and $L^2$ is complete. $L^1$ has a norm $\|f\|_1 = \int |f|$ but no inner product that induces this norm (it fails the parallelogram law). While $L^1$ is a Banach space (complete), it is not a Hilbert space. This is why Fourier analysis works naturally in $L^2$ (orthonormal bases exist) but not in $L^1$ .

Q6: How does the spectral theorem relate to PCA?

Answer: PCA finds the eigenvectors of the covariance matrix $\Sigma = \frac{1}{n}X^TX$ . In the infinite-dimensional setting, the covariance operator $T: L^2 \to L^2$ defined by $(Tf)(x) = \int K(x,y)f(y)dy$ is a compact self-adjoint operator. The spectral theorem guarantees an orthonormal eigenbasis $\{e_n\}$ with eigenvalues $\lambda_n \to 0$ . The top eigenfunctions capture the directions of maximum variance, generalizing PCA to function spaces (functional PCA).

Quick Reference

Concept	Definition/Formula	Key Property
Norm	$\\|x\\| \geq 0$ , $\\|\alpha x\\| = \\|\alpha\\|\\|x\\|$ , $\\|x+y\\| \leq \\|x\\|+\\|y\\|$	Distance from origin
Inner Product	$\langle x, y \rangle$ with conjugate symmetry, linearity	Enables geometry (angles)
Cauchy-Schwarz	$\|\langle x, y \rangle\| \leq \\|x\\|\\|y\\|$	Bound on inner product
Hilbert Space	Complete inner product space	Orthonormal bases exist
Banach Space	Complete normed space	Convergence guaranteed
Bounded Operator	$\\|T\\| = \sup_{\\|x\\|=1}\\|Tx\\| < \infty$	Continuous
Adjoint	$\langle Tx, y \rangle = \langle x, T^*y \rangle$	Generalizes transpose
RKHS	Hilbert space of functions with bounded point evaluation	$f(x) = \langle f, K(x,\cdot) \rangle$
Kernel Trick	$K(x,y) = \langle \phi(x), \phi(y) \rangle$	Implicit high-dim computation
Representer Theorem	$f^* = \sum \alpha_i K(x_i, \cdot)$	Infinite-dim → finite-dim
Compact Operator	Maps bounded sets to precompact sets	"Almost finite-rank"
Spectral Theorem	$T = \sum \lambda_n \langle \cdot, e_n \rangle e_n$	Diagonalizes compact self-adjoint operators

Cross-References

Linear Algebra â€” Linear Algebra Basics: Finite-dimensional vector spaces
Measure Theory â€” Advanced Measure Theory: Integration theory on function spaces
Differential Geometry â€” Advanced Differential Geometry: Manifolds and differential forms
Topology â€” Advanced Topological: Topological vector spaces
Tensor Calculus â€” Advanced Tensor Calculus: Tensor products of vector spaces