AI / ML Theory Roadmap
roadmap, machine learning theory, learning theory, optimization, high-dimensional statistics
1 Purpose
This roadmap is for readers who do not just want to use ML tools, but want to read theory-facing papers, understand assumptions, and know why the main guarantees look the way they do.
It is not a universal ML curriculum. It is a dependency-aware path through the mathematics that most often supports ML theory.
2 Who This Is For
Use this roadmap if your goal is any of the following:
- read papers with proofs, bounds, asymptotics, or concentration arguments
- understand why optimization and generalization claims are stated the way they are
- move toward learning theory, high-dimensional statistics, or deep learning theory
- connect the current site’s math modules to ML research directions
If your goal is only to get a working model pipeline quickly, this is probably too theory-heavy as a first route.
3 Main Sequence
Use this as the default order.
- Proofs
- Logic
- Linear Algebra
- Probability
- Statistics
- Single-Variable Calculus
- Multivariable Calculus
- Optimization
- Real Analysis
- Learning Theory
- Matrix Analysis
- High-Dimensional Probability
- High-Dimensional Statistics
The first thirteen stages are now live on the site through High-Dimensional Statistics, with Matrix Analysis and High-Dimensional Probability feeding directly into a complete first-pass high-dimensional-statistics module. The site now also has a full Numerical Methods module for the computation side of that stack, which is enough to begin reading the cleaner end of ML-facing theory pages.
4 Why This Order Works
4.1 Proofs and Logic First
These pages train the habits that later theory papers assume without apology:
- parse assumptions carefully
- expose hidden quantifiers
- translate between prose and symbolic structure
- negate a statement correctly before trying to prove or refute it
4.2 Linear Algebra Before Most ML
ML is full of vectors, projections, low-rank structure, eigenmodes, and learned linear maps.
Without linear algebra, model descriptions become memorized recipes. With it, many architectures reduce to variations on a small number of reusable objects.
4.3 Probability Before Statistics
Theory-facing ML uses probability to talk about randomness, sampling, concentration, conditioning, and asymptotics.
Statistics then turns that language into estimators, validation, uncertainty, and generalization-facing ideas.
6 Branch Points
After the shared core, most readers should branch instead of forcing one long linear path.
6.1 Optimization Branch
Use this branch if you care about:
- gradient descent and SGD
- objective design
- regularization
- training dynamics
- convex and nonconvex viewpoints
6.2 Learning Theory Branch
Use this branch if you care about:
- generalization guarantees
- bias-variance structure
- stability and capacity ideas
- VC / Rademacher style reasoning
6.3 High-Dimensional Statistics Branch
Use this branch if you care about:
- many-features regimes
- overparameterization
- sparsity and shrinkage
- inference and estimation when \(p\) is large relative to \(n\)
6.4 Deep Learning Theory Branch
Use this branch if you care about:
- backpropagation and computation graphs
- implicit bias of optimization
- representation learning
- scaling and overparameterized training
Useful bridge pages for this branch are:
6.5 Graph and Structured Learning Branch
Use this branch if you care about:
- graph diffusion and neighborhood averaging
- message-passing neural networks
- molecular, relational, or recommendation graphs
- spectral versus spatial graph design choices
Useful bridge pages for this branch are:
6.6 Kernel and Bayesian Nonparametrics Branch
Use this branch if you care about:
- similarity-based nonlinear prediction
- kernel ridge regression
- Gaussian-process regression
- uncertainty-aware function prediction
Useful bridge pages for this branch are:
6.7 Generative Modeling Branch
Use this branch if you care about:
- denoising as a learning objective
- iterative sample generation
- diffusion probabilistic models
- score-based or stochastic-process views of generation
Useful bridge pages for this branch are:
7 Pages On The Site That Already Support This Roadmap
7.1 Strongest Current Math Support
7.2 Strongest Current ML Bridge Pages
- Machine Learning applications hub
- Regularization, Implicit Bias, and Model Complexity
- Attention, Softmax, and Weighted Mixtures
- Kernel Methods and Similarity Geometry
- Kernel Ridge and Gaussian-Process Intuition
- Bayesian Optimization and Surrogate Modeling
- Uncertainty Calibration and Predictive Confidence
- Graph Diffusion and Message Passing
- Oversmoothing, Depth, and Graph Sampling
- Graph Rewiring, Homophily, and Heterophily
- Long-Range Dependence and Oversquashing in Graphs
- Representation Learning and Geometry of Embeddings
- Linear Probes and Representation Diagnostics
- Diffusion Models and Denoising
- Score Matching and the SDE View of Diffusion
- Flow Matching and Transport Views of Generation
- In-Context Learning and Linearization
- Linear Regression Through Projection
- PCA Through SVD
- Learned Linear Projections in Transformers
- Vector Mixtures in Embeddings and Attention
8 Paper Reading Overlay
Do not wait until the entire roadmap is complete before touching papers.
Run a light paper-reading overlay in parallel:
- How to Read a Paper
- one ML-facing application page
- one matching paper-lab page
A good current sequence is:
9 Common Next Theory Directions
After the current foundations, calculus bridge, optimization path, learning-theory spine, Matrix Analysis, High-Dimensional Probability, High-Dimensional Statistics, Numerical Methods, and Control and Dynamics, a strong adjacent theory module now live is:
10 Sources and Further Reading
- CS229: Machine Learning -
First pass- official current ML course hub with a broad math-aware overview of the field. Checked2026-04-24. - CS 189 Syllabus -
First pass- official Berkeley syllabus showing a modern intro-ML course with clear prerequisite expectations. Checked2026-04-24. - EE364a: Convex Optimization I -
Second pass- official optimization course that naturally supports the next major branch in the roadmap. Checked2026-04-24. - Mathematics for Machine Learning -
Second pass- strong math bridge for readers moving from foundations toward ML language. Checked2026-04-24. - CS229T / Statistical Learning Theory -
Paper bridge- a compact entry into theory-heavy ML beyond introductory courses. Checked2026-04-24.