Lasso and Compressed Sensing Basics

How l1 regularization turns sparsity into a convex estimator, why soft-thresholding is the first key picture, and how compressed sensing links sparse recovery to measurement geometry.
Modified

April 26, 2026

Keywords

lasso, compressed sensing, l1 regularization, soft thresholding, sparse recovery

1 Role

This is the second page of the High-Dimensional Statistics module.

The first page explained why high-dimensional estimation needs structure.

This page shows the first flagship estimator that turns that idea into a concrete method:

lasso uses an l1 penalty to prefer sparse solutions

It also introduces compressed sensing as the sparse-recovery version of the same story.

2 First-Pass Promise

Read this page after Sparsity and Regularization.

If you stop here, you should still understand:

  • what the lasso optimization problem is
  • why l1 regularization promotes sparsity
  • why compressed sensing is a sparse recovery problem from few measurements
  • why geometry of the design or sensing matrix matters as much as the penalty

3 Why It Matters

Sparsity by itself is only an assumption.

Lasso is the first major estimator that makes that assumption operational in a convex optimization problem.

Compressed sensing pushes the same idea further:

  • take very few linear measurements
  • assume the signal is sparse
  • recover the signal anyway

These topics matter because they are the canonical examples of how high-dimensional statistics works:

  • underdetermined problem
  • structural assumption
  • convex regularization
  • geometry plus concentration
  • theorem that explains when recovery is possible

4 Prerequisite Recall

  • high-dimensional problems are ill-posed without extra structure
  • regularization encodes structure through penalties or constraints
  • operator and spectral language help describe the geometry of the design matrix
  • concentration controls how random designs behave

5 Intuition

5.1 Why l1 Instead Of l0

The most direct sparse objective would count nonzero coordinates, but that leads to hard combinatorial optimization.

The l1 norm is the convex relaxation that still pushes many coordinates toward zero.

That is why lasso is both statistically meaningful and computationally tractable.

5.2 Shrinkage And Sparsity

Lasso does two things at once:

  • shrinks coefficient magnitudes
  • sets some coordinates exactly to zero

This is why it is different from ridge regression, which shrinks but typically does not create exact zeros.

5.3 Compressed Sensing View

Compressed sensing asks whether a sparse vector can be recovered from far fewer measurements than ambient dimension would usually suggest.

The answer depends not only on sparsity, but also on whether the sensing matrix preserves enough information about sparse directions.

6 Formal Core

Definition 1 (Definition: Lasso) The lasso estimator solves

\[ \widehat{\beta} \in \arg\min_{\beta \in \mathbb R^p} \left\{ \frac{1}{2n}\|y-X\beta\|_2^2 + \lambda \|\beta\|_1 \right\}. \]

This is the standard first-pass sparse regularization problem.

Definition 2 (Definition: Basis Pursuit / Sparse Recovery Form) In noiseless compressed sensing, a basic sparse recovery problem is

\[ \min_{\beta \in \mathbb R^p}\|\beta\|_1 \quad\text{subject to}\quad X\beta = y. \]

This is the constraint version of the same l1 idea.

Theorem 1 (Theorem Idea: l1 Geometry Promotes Sparse Solutions) The l1 penalty encourages solutions to lie on coordinate-aligned corners, which is why it can produce exact zeros in the estimated vector.

This is the first geometric picture to remember.

Theorem 2 (Theorem Idea: Under Orthonormal Design, Lasso Is Soft-Thresholding) If the design is orthonormal in the sense that

\[ \frac{1}{n}X^\top X = I, \]

then the lasso solution acts coordinatewise by soft-thresholding the unregularized coefficient scores.

This is the cleanest first-pass explanation of shrinkage plus sparsity.

Theorem 3 (Theorem Idea: Sparse Recovery Needs Geometry, Not Just Sparsity) Lasso and compressed sensing succeed when the design or sensing matrix has enough geometric regularity on sparse vectors.

Typical later conditions include:

  • restricted eigenvalues
  • coherence bounds
  • RIP-type assumptions

This is why sparse recovery theorems are about both the signal and the matrix.

Theorem 4 (Theorem Idea: Prediction, Estimation, and Support Recovery Are Different Goals) A method can predict well without perfectly identifying the true support, and it can estimate coefficients reasonably well without exact sign recovery.

This distinction prevents many common overclaims.

7 Worked Example

Suppose the design is orthonormal so that lasso decouples coordinatewise.

Let the unregularized coefficient scores be

\[ z = (2.4,\; 0.8,\; -1.3) \]

and take penalty level

\[ \lambda = 1. \]

Then soft-thresholding gives

\[ \widehat{\beta}_j = \operatorname{sign}(z_j)\big(|z_j|-\lambda\big)_+. \]

So

\[ \widehat{\beta} = (1.4,\;0,\;-0.3). \]

This one-step picture already shows the main lasso behavior:

  • large coordinates survive but are shrunk
  • small coordinates get set exactly to zero
  • the estimator prefers a sparse explanation over a purely data-fitting one

Compressed sensing keeps the same spirit, but now the matrix is underdetermined and recovery depends on whether sparse vectors remain distinguishable after measurement.

8 Computation Lens

When you see a sparse estimator, ask:

  1. is the penalty or constraint actually sparsity-inducing?
  2. what matrix geometry is the theorem using?
  3. is the guarantee about prediction, coefficient error, or support recovery?
  4. is the estimator solving a noisy regression problem or a sparse linear inverse problem?

Those questions usually tell you whether you are reading a lasso theorem, a compressed sensing theorem, or a different regularized estimator entirely.

9 Application Lens

9.1 Sparse Regression

Lasso is the canonical convex estimator for large-feature regression problems where only a small subset of predictors is expected to matter.

9.2 Compressed Sensing

Sparse signals in imaging, sensing, and inverse problems can often be recovered from surprisingly few linear measurements when the sensing matrix has good geometry.

9.3 Modern ML

Even when modern models are not literally lasso, the central ideas remain: structural bias, convex or implicit regularization, and the distinction between fit and recoverable signal.

10 Stop Here For First Pass

If you can now explain:

  • what lasso optimizes
  • why l1 penalties can create exact zeros
  • why compressed sensing is not just “do regression with fewer samples”
  • why matrix geometry matters for recovery

then this page has done its job.

11 Go Deeper

After this page, the next natural step is:

The strongest adjacent live pages right now are:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

13 Sources and Further Reading

Back to top