Learning Theory

How learning problems become theorem-level objects through risk, hypothesis classes, sample complexity, capacity, and stability.

Modified

April 26, 2026

Keywords

learning theory, ERM, PAC learning, VC dimension, Rademacher complexity

1 Why This Module Matters

Learning theory is where machine learning stops being only a modeling toolkit and becomes a theorem-level subject.

It asks questions like:

why should empirical success predict future success?
how many samples are enough?
what makes one hypothesis class more dangerous than another?
why do stability, regularization, or margins change generalization behavior?

This module sits directly on top of the site’s existing backbone:

probability provides concentration and randomness
statistics provides risk, estimation, and validation language
optimization explains empirical risk minimization and regularization
real analysis sharpens convergence, limits, and theorem reading

Without learning theory, many ML papers still look like claims about error curves. With it, they become claims about function classes, probability, and guarantees.

Prerequisites Probability, Statistics, and Optimization should come first. Proofs and Logic matter because nearly every guarantee is quantifier-heavy. Real Analysis becomes more important once the module reaches stronger convergence and complexity arguments.

Unlocks Generalization bounds, sample complexity, stability arguments, modern capacity language, theory-facing ML papers

Research Use Reading papers that talk about hypothesis classes, complexity measures, excess risk, bounds, or modern interpolation-era generalization

2 First Pass Through This Module

The intended first-pass spine for this module is:

This seven-page first-pass spine is now complete, covering the path from the learning setup through classical guarantees, data-dependent capacity, stability, and finally the modern interpolation-era picture.

3 How To Use This Module

Read the module in spine order.

The default reading path is:

start with ERM, Population Risk, and Hypothesis Classes
continue to PAC Learning, Sample Complexity, and the Learning Setup
continue to VC Dimension and Shattering
continue to Uniform Convergence and Generalization Bounds
continue to Rademacher Complexity and Data-Dependent Capacity
continue to Algorithmic Stability and Regularization
continue to Generalization in Modern Regimes
use nearby live pages in probability, statistics, optimization, and the ML applications section when a guarantee refers back to risk, regularization, or validation

The module should stay short and proof-driven rather than turning into a survey of every ML theorem family.

4 Core Concepts

ERM, Population Risk, and Hypothesis Classes: the foundational page that turns supervised learning into a theorem-ready problem statement.
PAC Learning, Sample Complexity, and the Learning Setup: the page that formalizes what it means to learn reliably from finite data.
VC Dimension and Shattering: the classic capacity language for binary classes and sample complexity.
Uniform Convergence and Generalization Bounds: the first major bridge from concentration and capacity control to ERM generalization guarantees.
Rademacher Complexity and Data-Dependent Capacity: the page where capacity becomes more adaptive and data-aware.
Algorithmic Stability and Regularization: the alternative route to generalization through sensitivity of the learning procedure.
Generalization in Modern Regimes: the bridge from classical theory to interpolation, implicit bias, and current research language.

5 Proof Patterns In This Module

Empirical-to-population comparison: control the gap between what was observed on the sample and what happens under the data distribution.
Capacity controls uniformity: the richer the class, the harder it is to guarantee that one sample represents all hypotheses well.
Stability as robustness: if small data changes do not move the output much, generalization can follow without the same capacity route.

6 Applications

6.1 Why Models Generalize

This is the main public-facing question the module answers. It explains why training error alone is never the full story, and why guarantees usually mention class size, complexity, margins, stability, or data assumptions.

6.2 Modern ML Theory

The module is also the cleanest bridge into papers about implicit bias, overparameterization, margin-based analysis, data-dependent complexity, and interpolation-era generalization.

7 Go Deeper By Topic

The main starting path is:

The strongest adjacent live pages right now are:

8 Optional Deeper Reading After First Pass

The strongest current references connected to this module are:

Stanford STATS214 / CS229M: Machine Learning Theory - official current course page describing the modern learning-theory arc, including uniform convergence and deep-learning theory. Checked 2026-04-25.
Stanford CS229T / STATS231: Statistical Learning Theory - official archived theory course page that still gives a clean classical spine. Checked 2026-04-25.
Stanford CS229T Notes - detailed official notes with ERM, generalization, VC, and complexity tools. Checked 2026-04-25.
Berkeley CS281B / STAT241B: Statistical Learning Theory - strong official second-pass course reference from Peter Bartlett. Checked 2026-04-25.

9 Study Order

For the current module state, read:

before trying to read capacity or generalization-bound papers cold.

You are ready to move deeper into the module when you can:

distinguish empirical risk from population risk without hand-waving
explain what a PAC statement is trying to guarantee
explain what it means for a class to shatter a set
explain why ERM needs simultaneous control over the whole class rather than one fixed hypothesis
explain what a hypothesis class is and why its size or richness matters
state why minimizing training error alone is not the same as learning
describe the learning problem in terms of data distribution, loss, predictor class, and target risk

10 Sources and Further Reading

Stanford STATS214 / CS229M: Machine Learning Theory - First pass - current official course page for a modern learning-theory path. Checked 2026-04-25.
Stanford CS229T / STATS231: Statistical Learning Theory - First pass - official archived course page with a clean theoretical backbone. Checked 2026-04-25.
Stanford CS229T Notes - First pass - detailed official notes that make the formal learning setup and generalization tools concrete. Checked 2026-04-25.
Berkeley CS281B / STAT241B: Statistical Learning Theory - Second pass - strong official course page for a deeper statistical-learning-theory route. Checked 2026-04-25.