Minimax and Lower Bounds
minimax risk, lower bounds, Fano, Le Cam, statistical limits
1 Role
This is the sixth page of the High-Dimensional Statistics module.
The previous pages built estimators and rates for sparse regression, covariance estimation, and PCA-like spectral estimation.
This page asks a more structural question:
are those rates actually good, or could some cleverer estimator do much better?
That is the role of minimax theory.
2 First-Pass Promise
Read this page after Covariance, PCA, and Spectral Estimation in High Dimension.
If you stop here, you should still understand:
- minimax risk as a worst-case benchmark over a model class
- why lower bounds are statements about what no estimator can beat
- why Le Cam and Fano methods compare many hard-to-distinguish data-generating models
- why matching upper and lower bounds mean a rate is basically the right one
3 Why It Matters
Once you see an estimator with a theorem, the next natural question is:
is this rate a feature of the estimator, or a feature of the problem itself?
Minimax theory answers that.
It separates:
algorithmic success: this estimator works this wellproblem difficulty: no estimator can do substantially better over the whole class
That distinction matters a lot in high-dimensional statistics, where rates depend on:
- sparsity
- ambient dimension
- sample size
- noise level
- geometry of the model class
Without lower bounds, it is easy to confuse a merely decent estimator with a rate-optimal one.
4 Prerequisite Recall
- sparse regression introduced typical upper-rate shapes such as
s log p / n - covariance and PCA introduced spectral estimation and operator-norm losses
- high-dimensional probability supplies the concentration tools for upper bounds
- learning theory already used the idea of comparing what an algorithm achieves to what the problem allows
5 Intuition
5.1 Minimax Is a Worst-Case Benchmark
If an estimator is denoted by \(\widehat \theta\) and the unknown parameter belongs to a class \(\Theta\), then each true parameter value gives a risk.
Minimax theory asks for the smallest possible worst-case risk over that whole class.
So minimax is not:
- average-case over easy instances
- best-case behavior
- performance on one favorite theorem example
It is the strongest reasonable benchmark for the whole model family.
5.2 Lower Bounds Use Hard Families
To prove a lower bound, you build a collection of parameters that are:
- well separated in the target quantity you want to estimate
- statistically hard to tell apart from data
That is the key tension.
If two models are too easy to distinguish, the lower bound is weak.
If they are too close in the target quantity, the lower bound is also weak.
Good lower-bound arguments balance separation with indistinguishability.
5.3 Matching Rates Are the Goal
An upper bound says:
here is what some estimator can achieve
A lower bound says:
no estimator can do better than this scale over the whole class
When those match up to constants or mild logarithmic factors, we usually say the rate is essentially optimal.
6 Formal Core
Definition 1 (Definition: Minimax Risk) For a parameter class \(\Theta\), loss \(\ell\), and estimators \(\widehat\theta\), the minimax risk is
\[ \inf_{\widehat\theta}\sup_{\theta \in \Theta}\mathbb E_\theta[\ell(\widehat\theta,\theta)]. \]
This is the smallest worst-case risk achievable over the class.
Theorem 1 (Theorem Idea: Upper and Lower Bounds Play Different Roles) An upper bound exhibits one estimator whose risk is small.
A lower bound shows that every estimator must have risk at least a certain scale.
When the scales match, the problem is understood much more sharply.
Theorem 2 (Theorem Idea: Lower Bounds Come from Indistinguishable Alternatives) Two standard first-pass routes are:
Le Cam style: compare a small number of carefully chosen alternativesFano style: compare a larger packing of alternatives and use information bounds
Both methods exploit the same logic:
- parameters are separated in the quantity of interest
- induced data distributions are still hard to distinguish
Theorem 3 (Theorem Idea: Sparse High-Dimensional Problems Carry Logarithmic Complexity Costs) For sparse classes, minimax lower bounds often involve the same structural terms that appear in upper bounds, such as
\[ \frac{s \log(p/s)}{n} \]
or related square-root forms, depending on the loss and model.
The exact statement depends on the problem class, but the first-pass message is that the log p price is often not an artifact of a particular proof technique.
Theorem 4 (Theorem Idea: Lower Bounds Justify Structural Assumptions) When lower bounds show that estimation is impossible without structure, they explain why assumptions such as sparsity, low rank, or eigengaps are not decorative.
They are what make recovery statistically possible in the first place.
7 Worked Example
Suppose a sparse-regression theorem gives an upper bound with scale
\[ \frac{s \log p}{n}. \]
Now take
s = 10p = 10,000n = 400
Then
\[ \log(p/s) = \log(1000) \approx 6.9 \]
so a first-pass lower-bound scale of
\[ \frac{s \log(p/s)}{n} \]
looks like
\[ \frac{10 \cdot 6.9}{400} \approx 0.17. \]
If the upper bound from an explicit estimator is on the same order, then the conclusion is not:
this estimator is perfect
The conclusion is:
this rate is probably a property of the problem, not just of the estimator
That is the minimax point of view in one line.
8 Computation Lens
When you read a lower-bound theorem, ask:
- what is the parameter class?
- what loss is being lower bounded?
- what family of hard alternatives is being constructed?
- what quantity makes the models hard to distinguish: KL, TV, or another divergence?
Those four questions usually reveal the real architecture of the proof.
9 Application Lens
9.1 Sparse Regression
Lower bounds explain why lasso-type rates are often close to optimal over sparse classes, and why the remaining work is about assumptions, constants, computation, or adaptivity rather than miracle rate improvements.
9.2 Covariance and PCA
In spectral estimation problems, lower bounds clarify when unstable eigenspaces or noisy covariance estimation are inevitable rather than just artifacts of a naive method.
9.3 ML Theory and Model Design
Lower bounds are also design tools. They tell you which targets are unrealistic and which assumptions are carrying the real statistical burden.
10 Stop Here For First Pass
If you can now explain:
- what minimax risk measures
- why lower bounds are statements about impossibility
- why Le Cam and Fano are really about hard-to-distinguish alternatives
- why matching upper and lower rates mean you understand the problem better
then this page has done its job.
11 Go Deeper
After this page, the next natural step is:
The strongest adjacent live pages right now are:
12 Optional Deeper Reading After First Pass
The strongest current references connected to this page are:
- Stanford Stats 311 lecture notes - official notes with a dedicated chapter on minimax lower bounds via Le Cam, Fano, and Assouad. Checked
2026-04-25. - Stanford Stats 300A Session 6 - official notes connecting information-theoretic lower bounds to a sparse regression example. Checked
2026-04-25. - Stanford EE378C - official course page for information-theoretic lower bounds in data science. Checked
2026-04-25. - CMU 36-709 course page - official course page showing minimax lower bounds inside a broader advanced-theoretical-statistics track. Checked
2026-04-25.
13 Sources and Further Reading
- Stanford Stats 311 lecture notes -
First pass- official notes for the minimax framework and the Le Cam, Fano, and Assouad methods. Checked2026-04-25. - Stanford Stats 300A Session 6 -
First pass- official notes using lower-bound ideas on sparse regression and matching-rate intuition. Checked2026-04-25. - Stanford EE378C -
Second pass- official course page for a wider lower-bounds toolkit across statistics, ML, and optimization. Checked2026-04-25. - CMU 36-709 course page -
Second pass- official course page linking minimax lower bounds to covariance estimation, sparse estimation, and nonparametric theory. Checked2026-04-25.