Learning Theory

How learning problems become theorem-level objects through risk, hypothesis classes, sample complexity, capacity, and stability.
Modified

April 26, 2026

Keywords

learning theory, ERM, PAC learning, VC dimension, Rademacher complexity

1 Why This Module Matters

Learning theory is where machine learning stops being only a modeling toolkit and becomes a theorem-level subject.

It asks questions like:

  • why should empirical success predict future success?
  • how many samples are enough?
  • what makes one hypothesis class more dangerous than another?
  • why do stability, regularization, or margins change generalization behavior?

This module sits directly on top of the site’s existing backbone:

  • probability provides concentration and randomness
  • statistics provides risk, estimation, and validation language
  • optimization explains empirical risk minimization and regularization
  • real analysis sharpens convergence, limits, and theorem reading

Without learning theory, many ML papers still look like claims about error curves. With it, they become claims about function classes, probability, and guarantees.

Prerequisites Probability, Statistics, and Optimization should come first. Proofs and Logic matter because nearly every guarantee is quantifier-heavy. Real Analysis becomes more important once the module reaches stronger convergence and complexity arguments.

Unlocks Generalization bounds, sample complexity, stability arguments, modern capacity language, theory-facing ML papers

Research Use Reading papers that talk about hypothesis classes, complexity measures, excess risk, bounds, or modern interpolation-era generalization

2 First Pass Through This Module

The intended first-pass spine for this module is:

  1. ERM, Population Risk, and Hypothesis Classes
  2. PAC Learning, Sample Complexity, and the Learning Setup
  3. VC Dimension and Shattering
  4. Uniform Convergence and Generalization Bounds
  5. Rademacher Complexity and Data-Dependent Capacity
  6. Algorithmic Stability and Regularization
  7. Generalization in Modern Regimes

This seven-page first-pass spine is now complete, covering the path from the learning setup through classical guarantees, data-dependent capacity, stability, and finally the modern interpolation-era picture.

3 How To Use This Module

Read the module in spine order.

The default reading path is:

  1. start with ERM, Population Risk, and Hypothesis Classes
  2. continue to PAC Learning, Sample Complexity, and the Learning Setup
  3. continue to VC Dimension and Shattering
  4. continue to Uniform Convergence and Generalization Bounds
  5. continue to Rademacher Complexity and Data-Dependent Capacity
  6. continue to Algorithmic Stability and Regularization
  7. continue to Generalization in Modern Regimes
  8. use nearby live pages in probability, statistics, optimization, and the ML applications section when a guarantee refers back to risk, regularization, or validation

The module should stay short and proof-driven rather than turning into a survey of every ML theorem family.

4 Core Concepts

5 Proof Patterns In This Module

  • Empirical-to-population comparison: control the gap between what was observed on the sample and what happens under the data distribution.
  • Capacity controls uniformity: the richer the class, the harder it is to guarantee that one sample represents all hypotheses well.
  • Stability as robustness: if small data changes do not move the output much, generalization can follow without the same capacity route.

6 Applications

6.1 Why Models Generalize

This is the main public-facing question the module answers. It explains why training error alone is never the full story, and why guarantees usually mention class size, complexity, margins, stability, or data assumptions.

6.2 Modern ML Theory

The module is also the cleanest bridge into papers about implicit bias, overparameterization, margin-based analysis, data-dependent complexity, and interpolation-era generalization.

7 Go Deeper By Topic

The main starting path is:

  1. ERM, Population Risk, and Hypothesis Classes
  2. PAC Learning, Sample Complexity, and the Learning Setup
  3. VC Dimension and Shattering
  4. Uniform Convergence and Generalization Bounds
  5. Rademacher Complexity and Data-Dependent Capacity
  6. Algorithmic Stability and Regularization
  7. Generalization in Modern Regimes

The strongest adjacent live pages right now are:

8 Optional Deeper Reading After First Pass

The strongest current references connected to this module are:

9 Study Order

For the current module state, read:

  1. ERM, Population Risk, and Hypothesis Classes
  2. PAC Learning, Sample Complexity, and the Learning Setup
  3. VC Dimension and Shattering
  4. Uniform Convergence and Generalization Bounds
  5. Rademacher Complexity and Data-Dependent Capacity
  6. Algorithmic Stability and Regularization
  7. Generalization in Modern Regimes

before trying to read capacity or generalization-bound papers cold.

You are ready to move deeper into the module when you can:

  • distinguish empirical risk from population risk without hand-waving
  • explain what a PAC statement is trying to guarantee
  • explain what it means for a class to shatter a set
  • explain why ERM needs simultaneous control over the whole class rather than one fixed hypothesis
  • explain what a hypothesis class is and why its size or richness matters
  • state why minimizing training error alone is not the same as learning
  • describe the learning problem in terms of data distribution, loss, predictor class, and target risk

10 Sources and Further Reading

Back to top