Minimax and Lower Bounds

What rates are fundamentally achievable in high-dimensional estimation, and how lower bounds explain when upper bounds are already near-optimal.

Modified

April 26, 2026

Keywords

minimax risk, lower bounds, Fano, Le Cam, statistical limits

1 Role

This is the sixth page of the High-Dimensional Statistics module.

The previous pages built estimators and rates for sparse regression, covariance estimation, and PCA-like spectral estimation.

This page asks a more structural question:

are those rates actually good, or could some cleverer estimator do much better?

That is the role of minimax theory.

2 First-Pass Promise

Read this page after Covariance, PCA, and Spectral Estimation in High Dimension.

If you stop here, you should still understand:

minimax risk as a worst-case benchmark over a model class
why lower bounds are statements about what no estimator can beat
why Le Cam and Fano methods compare many hard-to-distinguish data-generating models
why matching upper and lower bounds mean a rate is basically the right one

3 Why It Matters

Once you see an estimator with a theorem, the next natural question is:

is this rate a feature of the estimator, or a feature of the problem itself?

Minimax theory answers that.

It separates:

algorithmic success: this estimator works this well
problem difficulty: no estimator can do substantially better over the whole class

That distinction matters a lot in high-dimensional statistics, where rates depend on:

sparsity
ambient dimension
sample size
noise level
geometry of the model class

Without lower bounds, it is easy to confuse a merely decent estimator with a rate-optimal one.

4 Prerequisite Recall

sparse regression introduced typical upper-rate shapes such as s log p / n
covariance and PCA introduced spectral estimation and operator-norm losses
high-dimensional probability supplies the concentration tools for upper bounds
learning theory already used the idea of comparing what an algorithm achieves to what the problem allows

5 Intuition

5.1 Minimax Is a Worst-Case Benchmark

If an estimator is denoted by \(\widehat \theta\) and the unknown parameter belongs to a class \(\Theta\), then each true parameter value gives a risk.

Minimax theory asks for the smallest possible worst-case risk over that whole class.

So minimax is not:

average-case over easy instances
best-case behavior
performance on one favorite theorem example

It is the strongest reasonable benchmark for the whole model family.

5.2 Lower Bounds Use Hard Families

To prove a lower bound, you build a collection of parameters that are:

well separated in the target quantity you want to estimate
statistically hard to tell apart from data

That is the key tension.

If two models are too easy to distinguish, the lower bound is weak.

If they are too close in the target quantity, the lower bound is also weak.

Good lower-bound arguments balance separation with indistinguishability.

5.3 Matching Rates Are the Goal

An upper bound says:

here is what some estimator can achieve

A lower bound says:

no estimator can do better than this scale over the whole class

When those match up to constants or mild logarithmic factors, we usually say the rate is essentially optimal.

6 Formal Core

Definition 1 (Definition: Minimax Risk) For a parameter class \(\Theta\), loss \(\ell\), and estimators \(\widehat\theta\), the minimax risk is

\[ \inf_{\widehat\theta}\sup_{\theta \in \Theta}\mathbb E_\theta[\ell(\widehat\theta,\theta)]. \]

This is the smallest worst-case risk achievable over the class.

Theorem 1 (Theorem Idea: Upper and Lower Bounds Play Different Roles) An upper bound exhibits one estimator whose risk is small.

A lower bound shows that every estimator must have risk at least a certain scale.

When the scales match, the problem is understood much more sharply.

Theorem 2 (Theorem Idea: Lower Bounds Come from Indistinguishable Alternatives) Two standard first-pass routes are:

Le Cam style: compare a small number of carefully chosen alternatives
Fano style: compare a larger packing of alternatives and use information bounds

Both methods exploit the same logic:

parameters are separated in the quantity of interest
induced data distributions are still hard to distinguish

Theorem 3 (Theorem Idea: Sparse High-Dimensional Problems Carry Logarithmic Complexity Costs) For sparse classes, minimax lower bounds often involve the same structural terms that appear in upper bounds, such as

\[ \frac{s \log(p/s)}{n} \]

or related square-root forms, depending on the loss and model.

The exact statement depends on the problem class, but the first-pass message is that the log p price is often not an artifact of a particular proof technique.

Theorem 4 (Theorem Idea: Lower Bounds Justify Structural Assumptions) When lower bounds show that estimation is impossible without structure, they explain why assumptions such as sparsity, low rank, or eigengaps are not decorative.

They are what make recovery statistically possible in the first place.

7 Worked Example

Suppose a sparse-regression theorem gives an upper bound with scale

\[ \frac{s \log p}{n}. \]

Now take

s = 10
p = 10,000
n = 400

Then

\[ \log(p/s) = \log(1000) \approx 6.9 \]

so a first-pass lower-bound scale of

\[ \frac{s \log(p/s)}{n} \]

looks like

\[ \frac{10 \cdot 6.9}{400} \approx 0.17. \]

If the upper bound from an explicit estimator is on the same order, then the conclusion is not:

this estimator is perfect

The conclusion is:

this rate is probably a property of the problem, not just of the estimator

That is the minimax point of view in one line.

8 Computation Lens

When you read a lower-bound theorem, ask:

what is the parameter class?
what loss is being lower bounded?
what family of hard alternatives is being constructed?
what quantity makes the models hard to distinguish: KL, TV, or another divergence?

Those four questions usually reveal the real architecture of the proof.

9 Application Lens

9.1 Sparse Regression

Lower bounds explain why lasso-type rates are often close to optimal over sparse classes, and why the remaining work is about assumptions, constants, computation, or adaptivity rather than miracle rate improvements.

9.2 Covariance and PCA

In spectral estimation problems, lower bounds clarify when unstable eigenspaces or noisy covariance estimation are inevitable rather than just artifacts of a naive method.

9.3 ML Theory and Model Design

Lower bounds are also design tools. They tell you which targets are unrealistic and which assumptions are carrying the real statistical burden.

10 Stop Here For First Pass

If you can now explain:

what minimax risk measures
why lower bounds are statements about impossibility
why Le Cam and Fano are really about hard-to-distinguish alternatives
why matching upper and lower rates mean you understand the problem better

then this page has done its job.

11 Go Deeper

After this page, the next natural step is:

Inference in High Dimension

The strongest adjacent live pages right now are:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

Stanford Stats 311 lecture notes - official notes with a dedicated chapter on minimax lower bounds via Le Cam, Fano, and Assouad. Checked 2026-04-25.
Stanford Stats 300A Session 6 - official notes connecting information-theoretic lower bounds to a sparse regression example. Checked 2026-04-25.
Stanford EE378C - official course page for information-theoretic lower bounds in data science. Checked 2026-04-25.
CMU 36-709 course page - official course page showing minimax lower bounds inside a broader advanced-theoretical-statistics track. Checked 2026-04-25.

13 Sources and Further Reading

Stanford Stats 311 lecture notes - First pass - official notes for the minimax framework and the Le Cam, Fano, and Assouad methods. Checked 2026-04-25.
Stanford Stats 300A Session 6 - First pass - official notes using lower-bound ideas on sparse regression and matching-rate intuition. Checked 2026-04-25.
Stanford EE378C - Second pass - official course page for a wider lower-bounds toolkit across statistics, ML, and optimization. Checked 2026-04-25.
CMU 36-709 course page - Second pass - official course page linking minimax lower bounds to covariance estimation, sparse estimation, and nonparametric theory. Checked 2026-04-25.