Theorem Families

A research-facing map of the recurring theorem families that organize modern work in learning theory, optimization, probability, graphs, and generative modeling.

Modified

April 26, 2026

Keywords

theorem families, concentration, spectral methods, convexity, generalization, transport

1 Why This Page

Most research papers do not introduce completely new mathematics from scratch.

They usually belong to a recurring theorem family:

concentration and tail bounds
spectral or operator methods
convexity and duality
stability and generalization
transport and denoising

Once you can recognize the family, papers become easier to read.

You no longer ask only “what does this theorem say?” You also ask:

what family is this theorem from?
what assumptions usually come with this family?
what proof patterns should I expect?
which site pages should I revisit before going deeper?

That is the job of this page.

2 Research Snapshot

Type: top-level research map
Setting: math-heavy papers in CS, AI, engineering, optimization, and statistics
Main claim: theorem families are better reading units than isolated theorems
Why it matters: families tell you what toolkit a paper is using and what nearby papers are likely to look like

3 How To Use This Page

Use it in one of two ways.

3.1 Route 1: From a paper backward

If a theorem in a paper feels dense, ask which family it belongs to. Then come here and use the matching section to find:

stable prerequisites
common assumptions
proof habits
internal reading trails

3.2 Route 2: From a topic forward

If you already know a topic page and want to see where it leads in research, use this page to jump from:

stable math
to theorem family
to application pages
to paper labs
to current directions

4 Family 1: Concentration And Tail Bounds

4.1 What this family does

This family turns random fluctuation into quantitative control.

It tells you when averages, empirical quantities, matrices, or stochastic processes stay near their expected behavior.

4.2 Core objects

random variables and sums
martingales and processes
norms of random vectors and matrices
empirical averages and sample-dependent quantities

4.3 Typical assumptions

independence or weak dependence
boundedness, sub-Gaussian, or sub-exponential tails
Lipschitz structure
sample size large enough for the target regime

4.4 Common proof patterns

mgf or exponential-moment argument
symmetrization
union bound and covering argument
matrix lifting of scalar tail inequalities

4.5 Where it appears

generalization bounds
high-dimensional statistics
randomized numerical linear algebra
sketching and streaming
diffusion and sampling error control

4.6 Start here in this site

5 Family 2: Spectral And Operator Methods

5.1 What this family does

This family explains structure through eigenvalues, singular values, invariant subspaces, and repeated operator action.

It is one of the cleanest bridges from linear algebra to modern ML, graph learning, and scientific computing.

5.2 Core objects

matrices and linear maps
eigenvalues and eigenvectors
singular values and low-rank approximations
graph Laplacians and propagation operators

5.3 Typical assumptions

symmetry, positive semidefiniteness, or near-normal behavior
low-rank or approximately low-rank structure
graph or operator structure that makes the spectrum informative
perturbations small enough for spectral interpretation to survive

5.4 Common proof patterns

orthogonal decomposition
variational characterizations
perturbation bounds
repeated-operator or power-method intuition

5.5 Where it appears

PCA and dimensionality reduction
spectral clustering
graph neural networks
attention and linear-operator viewpoints
operator learning and scientific ML

5.6 Start here in this site

6 Family 3: Convexity, Duality, And Certificates

6.1 What this family does

This family gives structure to optimization problems and makes correctness or impossibility statements auditable.

It is the theorem family behind many “why does this optimizer / relaxation / certificate work?” questions.

6.2 Core objects

convex sets and convex functions
subgradients and optimality conditions
feasible regions and constraints
primal and dual problems
certificates of optimality or infeasibility

6.3 Typical assumptions

convexity
regularity or constraint qualification
smoothness or subgradient access
duality gap small or zero under the right regime

6.4 Common proof patterns

separating hyperplane argument
first-order optimality inequalities
Lagrangian construction
weak and strong duality comparisons

6.5 Where it appears

optimization theory
inverse problems and regularization
structured estimation
differentiable optimization layers
robust and constrained ML pipelines

6.6 Start here in this site

7 Family 4: Stability, Capacity, And Generalization

7.1 What this family does

This family asks why performance on observed data says anything about new data.

It is the core theorem family behind learning theory, model complexity, validation, and many modern questions around overparameterized models.

7.2 Core objects

population risk and empirical risk
hypothesis classes and complexity measures
algorithmic stability
training, validation, and test distributions

7.3 Typical assumptions

i.i.d. or controlled sampling regime
bounded or well-behaved loss
complexity control through class restriction or algorithmic bias
distributional assumptions explicit enough to interpret the bound

7.4 Common proof patterns

symmetrization and ghost-sample tricks
uniform convergence
stability argument
capacity or complexity comparison

7.5 Where it appears

learning theory
regularization and implicit bias
validation and model selection
calibration and uncertainty
distribution shift analysis

7.6 Start here in this site

8 Family 5: Transport, Score, And Denoising

8.1 What this family does

This family studies how probability distributions evolve, transform, or can be recovered through local score or velocity information.

It has become one of the clearest modern bridges from probability and calculus to generative modeling.

8.2 Core objects

densities and score functions
stochastic differential equations
velocity fields and flows
probability paths and denoising objectives

8.3 Typical assumptions

regularity of score or vector fields
well-defined interpolation or noising process
numerical solvability of forward or reverse dynamics
approximation quality of learned score or flow model

8.4 Common proof patterns

continuity-equation viewpoint
reverse-time stochastic process reasoning
denoising identity or score matching argument
transport-based comparison of distributions

8.5 Where it appears

diffusion models
flow matching
score-based generation
uncertainty-aware sampling
scientific generative modeling

8.6 Start here in this site

9 Family 6: Geometry Of Representations And Message Passing

9.1 What this family does

This family studies how structure is shaped by propagation, neighborhood geometry, invariances, and embedding spaces.

It is especially important in graph learning, representation learning, and transformer-era ML.

9.2 Core objects

embeddings and similarity geometry
graph neighborhoods and propagation operators
homophily and heterophily structure
bottlenecks, oversquashing, and long-range dependence

9.3 Typical assumptions

graph or representation geometry captures useful structure
propagation operator matches the task geometry
feature similarity is informative enough for the learning signal
depth or receptive-field growth does not destroy the information you need

9.4 Common proof patterns

operator and diffusion viewpoint
neighborhood expansion or bottleneck analysis
invariance or symmetry argument
geometry-of-embeddings interpretation

9.5 Where it appears

graph neural networks
message passing systems
representation diagnostics
embedding geometry
attention and mixture views

9.6 Start here in this site

10 How To Pick The Right Family

Use these quick cues.

if the theorem says with probability at least or controls deviations, start with concentration
if the theorem talks about eigenvalues, singular values, Laplacians, or repeated linear action, start with spectral
if the theorem uses argmin, certificates, KKT, or dual variables, start with convexity and duality
if the theorem compares empirical and population behavior, start with generalization
if the theorem talks about scores, flows, denoising, or reverse-time dynamics, start with transport and denoising
if the theorem is about propagation, neighborhoods, or embedding structure, start with representation geometry

11 What Has Changed Recently

The stable backbone of these families has not changed much, but their frontier importance has shifted.

Right now, the most visibly active families around modern AI and theory-facing CS are:

stability, capacity, and generalization
transport, score, and denoising
geometry of representations and message passing
high-dimensional concentration and random-matrix methods

That is one reason this page distinguishes stable theorem families from the moving frontier built on top of them.

12 What To Learn Next

Directions, if you want to move from recurring theorem patterns to active frontier maps
Paper Lab, if you want theorem-reading workflow instead of topic-first study
Applications > Machine Learning, if you want problem-first bridges
AI / ML Theory roadmap, if your goal is a research-reading route rather than isolated topic lookup

13 Sources And Further Reading

MIT Mathematics for Computer Science - First pass - durable source for proof structure and theorem language before specializing into research families. Checked 2026-04-25.
EE364a: Convex Optimization I - Second pass - clean official anchor for convexity, duality, and certificate-style theorem families. Checked 2026-04-25.
CS224W 2024 - Second pass - current official course hub showing how spectral and message-passing families organize graph learning. Checked 2026-04-25.
CS229T Course Description - Paper bridge - useful current signal for generalization, overparameterization, and modern ML-theory families. Checked 2026-04-25.
High-Dimensional Probability - Paper bridge - strong current bridge into concentration, random matrices, and modern data-science probability. Checked 2026-04-25.