Paper Lab: Randomized Sketching for Least Squares

A guided reading page for a modern least-squares paper that connects projection geometry to large-scale computation.

Modified

April 26, 2026

Keywords

paper reading, least squares, sketching, randomized numerical linear algebra

1 Why This Paper

Use this paper lab when you want your first serious example of a research paper that still keeps one foot in familiar linear algebra.

The anchor paper is:

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

It is a strong bridge paper because it asks a modern question without abandoning the core object:

What happens to least-squares geometry when we compress the problem before solving it?

2 Paper At A Glance

Type: theory-meets-algorithms paper with statistical framing
Setting: large-scale ordinary least squares with randomized sketching
Main claim: algorithmic guarantees and statistical guarantees for sketched least squares are not interchangeable
Why it matters: it turns a familiar objective into a modern question about scale, approximation, and what counts as success

3 Reading Plan

Use the three-pass pattern.

3.1 First pass

Read the paper as a story about two competing goals:

make least squares faster by replacing (X, Y) with (SX, SY)
keep the resulting estimator close enough to the original or statistically meaningful target

The paper’s main contribution is not only a set of bounds. It is a separation of viewpoints:

algorithmic / worst-case: how well does the sketch preserve the optimization problem?
statistical: how well does the sketched estimator predict under a linear model with noise?

That distinction is easy to miss if you read the paper only as “random projection makes regression faster.”

3.2 Second pass

The main mathematical objects to track are:

the design matrix \(X\)
the response vector \(Y\)
the sketching matrix \(S\)
the sketched problem (SX, SY)
residual efficiency versus prediction efficiency

The page Orthogonality and Least Squares tells you what the exact least-squares solution is doing geometrically.

This paper asks how much of that geometry survives when \(X\) and \(Y\) are compressed first.

Two reading habits help:

whenever you see a guarantee, ask whether it is about the objective value, the coefficient vector, or prediction
whenever you see a random sketch, ask which subspace or norm it is trying to preserve

3.3 Third pass

On a third pass, focus on proof architecture:

where the sketch needs to preserve subspace geometry
where the paper changes the success criterion from residual quality to prediction quality
where experiments are supporting a conceptual separation rather than merely reporting speedups

4 Prerequisite Math Map

orthogonality and projection in least squares
normal equations and why they characterize the optimum
basic random matrix or concentration language
the difference between algorithmic error and statistical prediction error

Read this paper after you are comfortable with:

5 Notation Table

\(X\): design matrix
\(Y\): response vector
\(S\): sketching matrix
(SX, SY): compressed least-squares instance
RE: residual efficiency
PE: prediction efficiency

6 Assumption Audit

The paper materially depends on these assumptions:

the regression problem is linear least squares
the sketch is chosen from a family with geometric preservation properties
statistical guarantees are evaluated under an explicit noisy linear model
the comparison criterion must be fixed before theorems are interpreted

If you blur the last point, the paper becomes easy to misread.

7 Main Results Unpacked

The main conceptual result is not “sketching works” in a generic sense.

It is:

different notions of success lead to different sample-size requirements and different interpretations of the same sketch.

Unpack the paper this way:

Objective-level view Theorems here ask whether the sketched problem preserves the least-squares objective or residual quality well enough.
Prediction-level view Theorems here ask whether the sketched estimator predicts as well under a statistical model with noise.
Key takeaway Residual preservation can look strong before prediction preservation does. That gap is the heart of the paper.

8 Evidence Audit

The paper’s evidence comes in two forms.

First, theorem-level comparisons explain when sketched least squares preserves residual behavior well and when it does not preserve prediction quality as strongly.

Second, experiments compare sketching schemes and sample sizes. Those experiments matter because they make the split between optimization approximation and statistical prediction visible.

9 What To Reproduce

A good reproduction target is:

generate a synthetic Gaussian design matrix \(X\)
generate responses from a noisy linear model
compare exact least squares with sketched least squares for increasing sketch size
plot both residual error and prediction error

If your plots show the two error criteria behaving differently, then you are reproducing the paper’s conceptual heart rather than just one figure.

10 Context And Lineage

This paper sits between two traditions:

classical numerical linear algebra, which cares about preserving the least-squares solve efficiently
modern statistical learning theory, which cares about prediction and generalization under noise

That is why it is such a useful bridge paper for this site.

11 Transfer And Limits

What transfers:

the projection view of least squares
the importance of preserving subspace geometry
the need to declare the right success metric before reading a theorem

What does not transfer automatically:

guarantees for constrained least squares
guarantees for overparameterized interpolation
conclusions about robustness, privacy, or nonlinear regression

12 What Has Changed Since Publication

Direct sketching follow-ups include:

constrained or iterative sketching, such as Iterative Hessian Sketch
newer distributed and memory-constrained sketching regimes, such as Distributed Least Squares in Small Space via Sketching and Bias Reduction

Neighboring least-squares directions include:

overparameterized regression and minimum-norm interpolation, such as The Implicit Bias of Benign Overfitting

So the paper still works as a bridge, but it should be read as a sketching paper first and a gateway to adjacent least-squares theory second.

13 Resource Kit

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares - Paper bridge - the anchor reading for this lab. Checked 2026-04-24.
Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares - Paper bridge - good follow-up if you want stronger optimization-oriented guarantees. Checked 2026-04-24.
Distributed Least Squares in Small Space via Sketching and Bias Reduction - Paper bridge - useful current example of the same idea under modern distributed-memory constraints. Checked 2026-04-24.