Learning Theory
learning theory, ERM, PAC learning, VC dimension, Rademacher complexity
1 Why This Module Matters
Learning theory is where machine learning stops being only a modeling toolkit and becomes a theorem-level subject.
It asks questions like:
- why should empirical success predict future success?
- how many samples are enough?
- what makes one hypothesis class more dangerous than another?
- why do stability, regularization, or margins change generalization behavior?
This module sits directly on top of the site’s existing backbone:
- probability provides concentration and randomness
- statistics provides risk, estimation, and validation language
- optimization explains empirical risk minimization and regularization
- real analysis sharpens convergence, limits, and theorem reading
Without learning theory, many ML papers still look like claims about error curves. With it, they become claims about function classes, probability, and guarantees.
2 First Pass Through This Module
The intended first-pass spine for this module is:
- ERM, Population Risk, and Hypothesis Classes
- PAC Learning, Sample Complexity, and the Learning Setup
- VC Dimension and Shattering
- Uniform Convergence and Generalization Bounds
- Rademacher Complexity and Data-Dependent Capacity
- Algorithmic Stability and Regularization
- Generalization in Modern Regimes
This seven-page first-pass spine is now complete, covering the path from the learning setup through classical guarantees, data-dependent capacity, stability, and finally the modern interpolation-era picture.
3 How To Use This Module
Read the module in spine order.
The default reading path is:
- start with ERM, Population Risk, and Hypothesis Classes
- continue to PAC Learning, Sample Complexity, and the Learning Setup
- continue to VC Dimension and Shattering
- continue to Uniform Convergence and Generalization Bounds
- continue to Rademacher Complexity and Data-Dependent Capacity
- continue to Algorithmic Stability and Regularization
- continue to Generalization in Modern Regimes
- use nearby live pages in probability, statistics, optimization, and the ML applications section when a guarantee refers back to risk, regularization, or validation
The module should stay short and proof-driven rather than turning into a survey of every ML theorem family.
4 Core Concepts
- ERM, Population Risk, and Hypothesis Classes: the foundational page that turns supervised learning into a theorem-ready problem statement.
- PAC Learning, Sample Complexity, and the Learning Setup: the page that formalizes what it means to learn reliably from finite data.
- VC Dimension and Shattering: the classic capacity language for binary classes and sample complexity.
- Uniform Convergence and Generalization Bounds: the first major bridge from concentration and capacity control to ERM generalization guarantees.
- Rademacher Complexity and Data-Dependent Capacity: the page where capacity becomes more adaptive and data-aware.
- Algorithmic Stability and Regularization: the alternative route to generalization through sensitivity of the learning procedure.
- Generalization in Modern Regimes: the bridge from classical theory to interpolation, implicit bias, and current research language.
5 Proof Patterns In This Module
Empirical-to-population comparison: control the gap between what was observed on the sample and what happens under the data distribution.Capacity controls uniformity: the richer the class, the harder it is to guarantee that one sample represents all hypotheses well.Stability as robustness: if small data changes do not move the output much, generalization can follow without the same capacity route.
6 Applications
6.1 Why Models Generalize
This is the main public-facing question the module answers. It explains why training error alone is never the full story, and why guarantees usually mention class size, complexity, margins, stability, or data assumptions.
6.2 Modern ML Theory
The module is also the cleanest bridge into papers about implicit bias, overparameterization, margin-based analysis, data-dependent complexity, and interpolation-era generalization.
7 Go Deeper By Topic
The main starting path is:
- ERM, Population Risk, and Hypothesis Classes
- PAC Learning, Sample Complexity, and the Learning Setup
- VC Dimension and Shattering
- Uniform Convergence and Generalization Bounds
- Rademacher Complexity and Data-Dependent Capacity
- Algorithmic Stability and Regularization
- Generalization in Modern Regimes
The strongest adjacent live pages right now are:
8 Optional Deeper Reading After First Pass
The strongest current references connected to this module are:
- Stanford STATS214 / CS229M: Machine Learning Theory - official current course page describing the modern learning-theory arc, including uniform convergence and deep-learning theory. Checked
2026-04-25. - Stanford CS229T / STATS231: Statistical Learning Theory - official archived theory course page that still gives a clean classical spine. Checked
2026-04-25. - Stanford CS229T Notes - detailed official notes with ERM, generalization, VC, and complexity tools. Checked
2026-04-25. - Berkeley CS281B / STAT241B: Statistical Learning Theory - strong official second-pass course reference from Peter Bartlett. Checked
2026-04-25.
9 Study Order
For the current module state, read:
- ERM, Population Risk, and Hypothesis Classes
- PAC Learning, Sample Complexity, and the Learning Setup
- VC Dimension and Shattering
- Uniform Convergence and Generalization Bounds
- Rademacher Complexity and Data-Dependent Capacity
- Algorithmic Stability and Regularization
- Generalization in Modern Regimes
before trying to read capacity or generalization-bound papers cold.
You are ready to move deeper into the module when you can:
- distinguish empirical risk from population risk without hand-waving
- explain what a PAC statement is trying to guarantee
- explain what it means for a class to shatter a set
- explain why ERM needs simultaneous control over the whole class rather than one fixed hypothesis
- explain what a hypothesis class is and why its size or richness matters
- state why minimizing training error alone is not the same as learning
- describe the learning problem in terms of data distribution, loss, predictor class, and target risk
10 Sources and Further Reading
- Stanford STATS214 / CS229M: Machine Learning Theory -
First pass- current official course page for a modern learning-theory path. Checked2026-04-25. - Stanford CS229T / STATS231: Statistical Learning Theory -
First pass- official archived course page with a clean theoretical backbone. Checked2026-04-25. - Stanford CS229T Notes -
First pass- detailed official notes that make the formal learning setup and generalization tools concrete. Checked2026-04-25. - Berkeley CS281B / STAT241B: Statistical Learning Theory -
Second pass- strong official course page for a deeper statistical-learning-theory route. Checked2026-04-25.