High-Dimensional Regression

Prediction, estimation, and support recovery in sparse regression when p is large relative to n, and why random design geometry controls the quality of lasso-type guarantees.
Modified

April 26, 2026

Keywords

high-dimensional regression, lasso rates, prediction error, support recovery, restricted eigenvalue

1 Role

This is the fourth page of the High-Dimensional Statistics module.

The previous page explained the design-geometry conditions that sparse regression theorems rely on.

This page asks the next natural question:

what can high-dimensional regression actually guarantee, and under what assumptions?

That is where the module moves from estimator definitions to rate statements.

2 First-Pass Promise

Read this page after Design Geometry: Restricted Eigenvalues, Coherence, and RIP.

If you stop here, you should still understand:

  • the difference between prediction error, coefficient estimation error, and support recovery
  • why sparse high-dimensional regression rates usually involve s log p / n
  • why design geometry matters as much as sparsity itself
  • why exact support recovery is a stronger and more fragile goal than good prediction

3 Why It Matters

After seeing lasso, it is tempting to ask only:

does it recover the true sparse vector?

But high-dimensional regression separates several goals:

  • predict well on new design points
  • estimate the coefficient vector well
  • identify the correct active variables
  • get the signs exactly right

These are not equivalent.

A method can predict well without recovering the exact support, and it can estimate coefficients reasonably well without perfect sign recovery.

That distinction is one of the core habits of high-dimensional statistics.

4 Prerequisite Recall

  • lasso turns sparsity into a convex estimator
  • compressed sensing emphasizes sparse recovery from few measurements
  • high-dimensional probability supplies the concentration tools behind random-design bounds
  • matrix analysis supplies the operator and geometric language used in design conditions

5 Intuition

5.1 One Estimator, Several Error Notions

In low-dimensional regression, people often slide between good fit, good coefficients, and good interpretation.

In high dimensions, those come apart much more sharply.

So every theorem should be read by first asking:

what loss is being controlled?

5.2 Why s Log p Appears

If the true signal is s-sparse, then the effective complexity depends much more on s than on ambient dimension p.

But the estimator still has to search through many possible coordinates, so a logarithmic price in p often remains.

That is why first-pass rates often look like:

\[ \frac{s \log p}{n} \]

or its square-root version.

5.3 Why Geometry Matters

Even with sparsity, the design matrix can be badly aligned.

Highly collinear predictors can make some sparse directions hard to distinguish from others.

So useful theorems need both:

  • a sparse target
  • a design matrix that behaves well on sparse vectors

6 Formal Core

Definition 1 (Definition: High-Dimensional Sparse Regression Model) At first pass, the model is

\[ y = X\beta^\star + \varepsilon \]

with

  • \(X \in \mathbb R^{n \times p}\)
  • \(p\) possibly much larger than \(n\)
  • \(\beta^\star\) sparse or approximately sparse

Definition 2 (Definition: Three Main Error Goals) In high-dimensional regression, common goals are:

  • prediction error: how well \(X\widehat\beta\) approximates \(X\beta^\star\)
  • estimation error: how well \(\widehat\beta\) approximates \(\beta^\star\)
  • support recovery: whether the nonzero coordinates are identified correctly

These should not be conflated.

Theorem 1 (Theorem Idea: Sparse High-Dimensional Regression Rates Depend On s, log p, and n) Under sparsity, regularization, and suitable design conditions, lasso-type estimators often satisfy first-pass bounds with scale

\[ \frac{s \log p}{n} \]

for prediction-type error and

\[ \sqrt{\frac{s \log p}{n}} \]

for coefficient-error quantities.

The exact norm and constants depend on the theorem, but this is the main scaling picture to remember.

Theorem 2 (Theorem Idea: Good Rates Need Good Design Geometry) Sparse regression guarantees require the design to behave reasonably on sparse directions.

Typical later conditions include:

  • restricted eigenvalue conditions
  • compatibility conditions
  • coherence or RIP-type conditions

Without some such condition, sparse estimation can become unstable or non-identifiable.

Theorem 3 (Theorem Idea: Support Recovery Is Stronger Than Prediction) Exact support or sign recovery typically needs stronger assumptions than prediction or coefficient error control.

This is why the strongest recovery theorems are also the most fragile.

7 Worked Example

Suppose:

  • the true coefficient vector is s = 10 sparse
  • the ambient dimension is p = 10,000
  • the sample size is n = 400

Then

\[ \log p \approx \log(10^4) \approx 9.2. \]

So the first-pass prediction scale

\[ \frac{s \log p}{n} \]

looks like

\[ \frac{10 \cdot 9.2}{400} \approx 0.23. \]

And the corresponding coefficient-scale quantity

\[ \sqrt{\frac{s \log p}{n}} \]

is roughly

\[ \sqrt{0.23} \approx 0.48. \]

The numbers themselves are not the theorem.

The point is the structure of the rate:

  • sparsity helps through s
  • high dimension still costs log p
  • more samples help through n

This is the basic arithmetic behind many first-pass high-dimensional regression bounds.

8 Computation Lens

When you read a regression guarantee, ask:

  1. what error notion is being bounded?
  2. what sparsity or approximate sparsity assumption is being made?
  3. what design geometry condition is required?
  4. is the theorem for fixed design, random design, or both?

Those four questions usually reveal what the bound is really saying.

9 Application Lens

9.1 Genomics and Text Regression

Large-feature regimes are common when predictors represent genes, words, tokens, or interactions. Sparse regression is the first tool people reach for, but interpretation must stay tied to the theorem’s actual target.

9.2 Causal or Scientific Feature Screening

Support recovery questions are appealing in science, but they are also the most assumption-sensitive. High-dimensional regression helps explain when variable selection claims are credible and when they are not.

9.3 ML Theory Bridge

This page is also a bridge back to modern ML: many overparameterized or feature-rich models raise the same questions about shrinkage, stability, recoverability, and geometry.

10 Stop Here For First Pass

If you can now explain:

  • why prediction, estimation, and support recovery are different goals
  • why s log p / n is the first rate shape to remember
  • why design geometry matters for sparse regression
  • why exact variable recovery is harder than good prediction

then this page has done its job.

11 Go Deeper

After this page, the next natural step is:

The strongest adjacent live pages right now are:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

13 Sources and Further Reading

Back to top