Theorem Families

A research-facing map of the recurring theorem families that organize modern work in learning theory, optimization, probability, graphs, and generative modeling.
Modified

April 26, 2026

Keywords

theorem families, concentration, spectral methods, convexity, generalization, transport

1 Why This Page

Most research papers do not introduce completely new mathematics from scratch.

They usually belong to a recurring theorem family:

  • concentration and tail bounds
  • spectral or operator methods
  • convexity and duality
  • stability and generalization
  • transport and denoising

Once you can recognize the family, papers become easier to read.

You no longer ask only “what does this theorem say?” You also ask:

  • what family is this theorem from?
  • what assumptions usually come with this family?
  • what proof patterns should I expect?
  • which site pages should I revisit before going deeper?

That is the job of this page.

2 Research Snapshot

  • Type: top-level research map
  • Setting: math-heavy papers in CS, AI, engineering, optimization, and statistics
  • Main claim: theorem families are better reading units than isolated theorems
  • Why it matters: families tell you what toolkit a paper is using and what nearby papers are likely to look like

3 How To Use This Page

Use it in one of two ways.

3.1 Route 1: From a paper backward

If a theorem in a paper feels dense, ask which family it belongs to. Then come here and use the matching section to find:

  • stable prerequisites
  • common assumptions
  • proof habits
  • internal reading trails

3.2 Route 2: From a topic forward

If you already know a topic page and want to see where it leads in research, use this page to jump from:

  • stable math
  • to theorem family
  • to application pages
  • to paper labs
  • to current directions

4 Family 1: Concentration And Tail Bounds

4.1 What this family does

This family turns random fluctuation into quantitative control.

It tells you when averages, empirical quantities, matrices, or stochastic processes stay near their expected behavior.

4.2 Core objects

  • random variables and sums
  • martingales and processes
  • norms of random vectors and matrices
  • empirical averages and sample-dependent quantities

4.3 Typical assumptions

  • independence or weak dependence
  • boundedness, sub-Gaussian, or sub-exponential tails
  • Lipschitz structure
  • sample size large enough for the target regime

4.4 Common proof patterns

  • mgf or exponential-moment argument
  • symmetrization
  • union bound and covering argument
  • matrix lifting of scalar tail inequalities

4.5 Where it appears

  • generalization bounds
  • high-dimensional statistics
  • randomized numerical linear algebra
  • sketching and streaming
  • diffusion and sampling error control

4.6 Start here in this site

  1. Probability
  2. Concentration and Common Inequalities
  3. Paper Lab: Randomized Sketching for Least Squares

5 Family 2: Spectral And Operator Methods

5.1 What this family does

This family explains structure through eigenvalues, singular values, invariant subspaces, and repeated operator action.

It is one of the cleanest bridges from linear algebra to modern ML, graph learning, and scientific computing.

5.2 Core objects

  • matrices and linear maps
  • eigenvalues and eigenvectors
  • singular values and low-rank approximations
  • graph Laplacians and propagation operators

5.3 Typical assumptions

  • symmetry, positive semidefiniteness, or near-normal behavior
  • low-rank or approximately low-rank structure
  • graph or operator structure that makes the spectrum informative
  • perturbations small enough for spectral interpretation to survive

5.4 Common proof patterns

  • orthogonal decomposition
  • variational characterizations
  • perturbation bounds
  • repeated-operator or power-method intuition

5.5 Where it appears

  • PCA and dimensionality reduction
  • spectral clustering
  • graph neural networks
  • attention and linear-operator viewpoints
  • operator learning and scientific ML

5.6 Start here in this site

  1. Linear Algebra
  2. Eigenvalues and Diagonalization
  3. SVD and Low-Rank Approximation
  4. Paper Lab: Spectral Clustering and Graph Modes

6 Family 3: Convexity, Duality, And Certificates

6.1 What this family does

This family gives structure to optimization problems and makes correctness or impossibility statements auditable.

It is the theorem family behind many “why does this optimizer / relaxation / certificate work?” questions.

6.2 Core objects

  • convex sets and convex functions
  • subgradients and optimality conditions
  • feasible regions and constraints
  • primal and dual problems
  • certificates of optimality or infeasibility

6.3 Typical assumptions

  • convexity
  • regularity or constraint qualification
  • smoothness or subgradient access
  • duality gap small or zero under the right regime

6.4 Common proof patterns

  • separating hyperplane argument
  • first-order optimality inequalities
  • Lagrangian construction
  • weak and strong duality comparisons

6.5 Where it appears

  • optimization theory
  • inverse problems and regularization
  • structured estimation
  • differentiable optimization layers
  • robust and constrained ML pipelines

6.6 Start here in this site

  1. Optimization
  2. Convex Functions and Subgradients
  3. Duality and Certificates
  4. Optimization for Machine Learning

7 Family 4: Stability, Capacity, And Generalization

7.1 What this family does

This family asks why performance on observed data says anything about new data.

It is the core theorem family behind learning theory, model complexity, validation, and many modern questions around overparameterized models.

7.2 Core objects

  • population risk and empirical risk
  • hypothesis classes and complexity measures
  • algorithmic stability
  • training, validation, and test distributions

7.3 Typical assumptions

  • i.i.d. or controlled sampling regime
  • bounded or well-behaved loss
  • complexity control through class restriction or algorithmic bias
  • distributional assumptions explicit enough to interpret the bound

7.4 Common proof patterns

  • symmetrization and ghost-sample tricks
  • uniform convergence
  • stability argument
  • capacity or complexity comparison

7.5 Where it appears

  • learning theory
  • regularization and implicit bias
  • validation and model selection
  • calibration and uncertainty
  • distribution shift analysis

7.6 Start here in this site

  1. Statistics
  2. Generalization, Overfitting, and Validation
  3. Uncertainty Calibration and Predictive Confidence
  4. AI / ML Theory roadmap

8 Family 5: Transport, Score, And Denoising

8.1 What this family does

This family studies how probability distributions evolve, transform, or can be recovered through local score or velocity information.

It has become one of the clearest modern bridges from probability and calculus to generative modeling.

8.2 Core objects

  • densities and score functions
  • stochastic differential equations
  • velocity fields and flows
  • probability paths and denoising objectives

8.3 Typical assumptions

  • regularity of score or vector fields
  • well-defined interpolation or noising process
  • numerical solvability of forward or reverse dynamics
  • approximation quality of learned score or flow model

8.4 Common proof patterns

  • continuity-equation viewpoint
  • reverse-time stochastic process reasoning
  • denoising identity or score matching argument
  • transport-based comparison of distributions

8.5 Where it appears

  • diffusion models
  • flow matching
  • score-based generation
  • uncertainty-aware sampling
  • scientific generative modeling

8.6 Start here in this site

  1. Diffusion Models and Denoising
  2. Score Matching and the SDE View of Diffusion
  3. Flow Matching and Transport Views of Generation

9 Family 6: Geometry Of Representations And Message Passing

9.1 What this family does

This family studies how structure is shaped by propagation, neighborhood geometry, invariances, and embedding spaces.

It is especially important in graph learning, representation learning, and transformer-era ML.

9.2 Core objects

  • embeddings and similarity geometry
  • graph neighborhoods and propagation operators
  • homophily and heterophily structure
  • bottlenecks, oversquashing, and long-range dependence

9.3 Typical assumptions

  • graph or representation geometry captures useful structure
  • propagation operator matches the task geometry
  • feature similarity is informative enough for the learning signal
  • depth or receptive-field growth does not destroy the information you need

9.4 Common proof patterns

  • operator and diffusion viewpoint
  • neighborhood expansion or bottleneck analysis
  • invariance or symmetry argument
  • geometry-of-embeddings interpretation

9.5 Where it appears

  • graph neural networks
  • message passing systems
  • representation diagnostics
  • embedding geometry
  • attention and mixture views

9.6 Start here in this site

  1. Attention, Softmax, and Weighted Mixtures
  2. Representation Learning and Geometry of Embeddings
  3. Long-Range Dependence and Oversquashing in Graphs
  4. Graph Rewiring, Homophily, and Heterophily

10 How To Pick The Right Family

Use these quick cues.

  • if the theorem says with probability at least or controls deviations, start with concentration
  • if the theorem talks about eigenvalues, singular values, Laplacians, or repeated linear action, start with spectral
  • if the theorem uses argmin, certificates, KKT, or dual variables, start with convexity and duality
  • if the theorem compares empirical and population behavior, start with generalization
  • if the theorem talks about scores, flows, denoising, or reverse-time dynamics, start with transport and denoising
  • if the theorem is about propagation, neighborhoods, or embedding structure, start with representation geometry

11 What Has Changed Recently

The stable backbone of these families has not changed much, but their frontier importance has shifted.

Right now, the most visibly active families around modern AI and theory-facing CS are:

  • stability, capacity, and generalization
  • transport, score, and denoising
  • geometry of representations and message passing
  • high-dimensional concentration and random-matrix methods

That is one reason this page distinguishes stable theorem families from the moving frontier built on top of them.

12 What To Learn Next

13 Sources And Further Reading

  • MIT Mathematics for Computer Science - First pass - durable source for proof structure and theorem language before specializing into research families. Checked 2026-04-25.
  • EE364a: Convex Optimization I - Second pass - clean official anchor for convexity, duality, and certificate-style theorem families. Checked 2026-04-25.
  • CS224W 2024 - Second pass - current official course hub showing how spectral and message-passing families organize graph learning. Checked 2026-04-25.
  • CS229T Course Description - Paper bridge - useful current signal for generalization, overparameterization, and modern ML-theory families. Checked 2026-04-25.
  • High-Dimensional Probability - Paper bridge - strong current bridge into concentration, random matrices, and modern data-science probability. Checked 2026-04-25.
Back to top