Theorem Families
theorem families, concentration, spectral methods, convexity, generalization, transport
1 Why This Page
Most research papers do not introduce completely new mathematics from scratch.
They usually belong to a recurring theorem family:
- concentration and tail bounds
- spectral or operator methods
- convexity and duality
- stability and generalization
- transport and denoising
Once you can recognize the family, papers become easier to read.
You no longer ask only “what does this theorem say?” You also ask:
- what family is this theorem from?
- what assumptions usually come with this family?
- what proof patterns should I expect?
- which site pages should I revisit before going deeper?
That is the job of this page.
2 Research Snapshot
Type: top-level research mapSetting: math-heavy papers in CS, AI, engineering, optimization, and statisticsMain claim: theorem families are better reading units than isolated theoremsWhy it matters: families tell you what toolkit a paper is using and what nearby papers are likely to look like
3 How To Use This Page
Use it in one of two ways.
3.1 Route 1: From a paper backward
If a theorem in a paper feels dense, ask which family it belongs to. Then come here and use the matching section to find:
- stable prerequisites
- common assumptions
- proof habits
- internal reading trails
3.2 Route 2: From a topic forward
If you already know a topic page and want to see where it leads in research, use this page to jump from:
- stable math
- to theorem family
- to application pages
- to paper labs
- to current directions
4 Family 1: Concentration And Tail Bounds
4.1 What this family does
This family turns random fluctuation into quantitative control.
It tells you when averages, empirical quantities, matrices, or stochastic processes stay near their expected behavior.
4.2 Core objects
- random variables and sums
- martingales and processes
- norms of random vectors and matrices
- empirical averages and sample-dependent quantities
4.3 Typical assumptions
- independence or weak dependence
- boundedness, sub-Gaussian, or sub-exponential tails
- Lipschitz structure
- sample size large enough for the target regime
4.4 Common proof patterns
- mgf or exponential-moment argument
- symmetrization
- union bound and covering argument
- matrix lifting of scalar tail inequalities
4.5 Where it appears
- generalization bounds
- high-dimensional statistics
- randomized numerical linear algebra
- sketching and streaming
- diffusion and sampling error control
4.6 Start here in this site
5 Family 2: Spectral And Operator Methods
5.1 What this family does
This family explains structure through eigenvalues, singular values, invariant subspaces, and repeated operator action.
It is one of the cleanest bridges from linear algebra to modern ML, graph learning, and scientific computing.
5.2 Core objects
- matrices and linear maps
- eigenvalues and eigenvectors
- singular values and low-rank approximations
- graph Laplacians and propagation operators
5.3 Typical assumptions
- symmetry, positive semidefiniteness, or near-normal behavior
- low-rank or approximately low-rank structure
- graph or operator structure that makes the spectrum informative
- perturbations small enough for spectral interpretation to survive
5.4 Common proof patterns
- orthogonal decomposition
- variational characterizations
- perturbation bounds
- repeated-operator or power-method intuition
5.5 Where it appears
- PCA and dimensionality reduction
- spectral clustering
- graph neural networks
- attention and linear-operator viewpoints
- operator learning and scientific ML
5.6 Start here in this site
6 Family 3: Convexity, Duality, And Certificates
6.1 What this family does
This family gives structure to optimization problems and makes correctness or impossibility statements auditable.
It is the theorem family behind many “why does this optimizer / relaxation / certificate work?” questions.
6.2 Core objects
- convex sets and convex functions
- subgradients and optimality conditions
- feasible regions and constraints
- primal and dual problems
- certificates of optimality or infeasibility
6.3 Typical assumptions
- convexity
- regularity or constraint qualification
- smoothness or subgradient access
- duality gap small or zero under the right regime
6.4 Common proof patterns
- separating hyperplane argument
- first-order optimality inequalities
- Lagrangian construction
- weak and strong duality comparisons
6.5 Where it appears
- optimization theory
- inverse problems and regularization
- structured estimation
- differentiable optimization layers
- robust and constrained ML pipelines
6.6 Start here in this site
7 Family 4: Stability, Capacity, And Generalization
7.1 What this family does
This family asks why performance on observed data says anything about new data.
It is the core theorem family behind learning theory, model complexity, validation, and many modern questions around overparameterized models.
7.2 Core objects
- population risk and empirical risk
- hypothesis classes and complexity measures
- algorithmic stability
- training, validation, and test distributions
7.3 Typical assumptions
- i.i.d. or controlled sampling regime
- bounded or well-behaved loss
- complexity control through class restriction or algorithmic bias
- distributional assumptions explicit enough to interpret the bound
7.4 Common proof patterns
- symmetrization and ghost-sample tricks
- uniform convergence
- stability argument
- capacity or complexity comparison
7.5 Where it appears
- learning theory
- regularization and implicit bias
- validation and model selection
- calibration and uncertainty
- distribution shift analysis
7.6 Start here in this site
8 Family 5: Transport, Score, And Denoising
8.1 What this family does
This family studies how probability distributions evolve, transform, or can be recovered through local score or velocity information.
It has become one of the clearest modern bridges from probability and calculus to generative modeling.
8.2 Core objects
- densities and score functions
- stochastic differential equations
- velocity fields and flows
- probability paths and denoising objectives
8.3 Typical assumptions
- regularity of score or vector fields
- well-defined interpolation or noising process
- numerical solvability of forward or reverse dynamics
- approximation quality of learned score or flow model
8.4 Common proof patterns
- continuity-equation viewpoint
- reverse-time stochastic process reasoning
- denoising identity or score matching argument
- transport-based comparison of distributions
8.5 Where it appears
- diffusion models
- flow matching
- score-based generation
- uncertainty-aware sampling
- scientific generative modeling
8.6 Start here in this site
9 Family 6: Geometry Of Representations And Message Passing
9.1 What this family does
This family studies how structure is shaped by propagation, neighborhood geometry, invariances, and embedding spaces.
It is especially important in graph learning, representation learning, and transformer-era ML.
9.2 Core objects
- embeddings and similarity geometry
- graph neighborhoods and propagation operators
- homophily and heterophily structure
- bottlenecks, oversquashing, and long-range dependence
9.3 Typical assumptions
- graph or representation geometry captures useful structure
- propagation operator matches the task geometry
- feature similarity is informative enough for the learning signal
- depth or receptive-field growth does not destroy the information you need
9.4 Common proof patterns
- operator and diffusion viewpoint
- neighborhood expansion or bottleneck analysis
- invariance or symmetry argument
- geometry-of-embeddings interpretation
9.5 Where it appears
- graph neural networks
- message passing systems
- representation diagnostics
- embedding geometry
- attention and mixture views
9.6 Start here in this site
10 How To Pick The Right Family
Use these quick cues.
- if the theorem says
with probability at leastor controls deviations, start withconcentration - if the theorem talks about eigenvalues, singular values, Laplacians, or repeated linear action, start with
spectral - if the theorem uses
argmin, certificates, KKT, or dual variables, start withconvexity and duality - if the theorem compares empirical and population behavior, start with
generalization - if the theorem talks about scores, flows, denoising, or reverse-time dynamics, start with
transport and denoising - if the theorem is about propagation, neighborhoods, or embedding structure, start with
representation geometry
11 What Has Changed Recently
The stable backbone of these families has not changed much, but their frontier importance has shifted.
Right now, the most visibly active families around modern AI and theory-facing CS are:
stability, capacity, and generalizationtransport, score, and denoisinggeometry of representations and message passinghigh-dimensional concentration and random-matrix methods
That is one reason this page distinguishes stable theorem families from the moving frontier built on top of them.
12 What To Learn Next
- Directions, if you want to move from recurring theorem patterns to active frontier maps
- Paper Lab, if you want theorem-reading workflow instead of topic-first study
- Applications > Machine Learning, if you want problem-first bridges
- AI / ML Theory roadmap, if your goal is a research-reading route rather than isolated topic lookup
13 Sources And Further Reading
- MIT Mathematics for Computer Science -
First pass- durable source for proof structure and theorem language before specializing into research families. Checked2026-04-25. - EE364a: Convex Optimization I -
Second pass- clean official anchor for convexity, duality, and certificate-style theorem families. Checked2026-04-25. - CS224W 2024 -
Second pass- current official course hub showing how spectral and message-passing families organize graph learning. Checked2026-04-25. - CS229T Course Description -
Paper bridge- useful current signal for generalization, overparameterization, and modern ML-theory families. Checked2026-04-25. - High-Dimensional Probability -
Paper bridge- strong current bridge into concentration, random matrices, and modern data-science probability. Checked2026-04-25.