Paper Lab: Randomized Sketching for Least Squares
paper reading, least squares, sketching, randomized numerical linear algebra
1 Why This Paper
Use this paper lab when you want your first serious example of a research paper that still keeps one foot in familiar linear algebra.
The anchor paper is:
It is a strong bridge paper because it asks a modern question without abandoning the core object:
What happens to least-squares geometry when we compress the problem before solving it?
2 Paper At A Glance
Type: theory-meets-algorithms paper with statistical framingSetting: large-scale ordinary least squares with randomized sketchingMain claim: algorithmic guarantees and statistical guarantees for sketched least squares are not interchangeableWhy it matters: it turns a familiar objective into a modern question about scale, approximation, and what counts as success
3 Reading Plan
Use the three-pass pattern.
3.1 First pass
Read the paper as a story about two competing goals:
- make least squares faster by replacing
(X, Y)with(SX, SY) - keep the resulting estimator close enough to the original or statistically meaningful target
The paper’s main contribution is not only a set of bounds. It is a separation of viewpoints:
algorithmic / worst-case: how well does the sketch preserve the optimization problem?statistical: how well does the sketched estimator predict under a linear model with noise?
That distinction is easy to miss if you read the paper only as “random projection makes regression faster.”
3.2 Second pass
The main mathematical objects to track are:
- the design matrix \(X\)
- the response vector \(Y\)
- the sketching matrix \(S\)
- the sketched problem
(SX, SY) - residual efficiency versus prediction efficiency
The page Orthogonality and Least Squares tells you what the exact least-squares solution is doing geometrically.
This paper asks how much of that geometry survives when \(X\) and \(Y\) are compressed first.
Two reading habits help:
- whenever you see a guarantee, ask whether it is about the objective value, the coefficient vector, or prediction
- whenever you see a random sketch, ask which subspace or norm it is trying to preserve
3.3 Third pass
On a third pass, focus on proof architecture:
- where the sketch needs to preserve subspace geometry
- where the paper changes the success criterion from residual quality to prediction quality
- where experiments are supporting a conceptual separation rather than merely reporting speedups
4 Prerequisite Math Map
- orthogonality and projection in least squares
- normal equations and why they characterize the optimum
- basic random matrix or concentration language
- the difference between algorithmic error and statistical prediction error
Read this paper after you are comfortable with:
5 Notation Table
- \(X\): design matrix
- \(Y\): response vector
- \(S\): sketching matrix
(SX, SY): compressed least-squares instanceRE: residual efficiencyPE: prediction efficiency
6 Assumption Audit
The paper materially depends on these assumptions:
- the regression problem is linear least squares
- the sketch is chosen from a family with geometric preservation properties
- statistical guarantees are evaluated under an explicit noisy linear model
- the comparison criterion must be fixed before theorems are interpreted
If you blur the last point, the paper becomes easy to misread.
7 Main Results Unpacked
The main conceptual result is not “sketching works” in a generic sense.
It is:
different notions of success lead to different sample-size requirements and different interpretations of the same sketch.
Unpack the paper this way:
Objective-level viewTheorems here ask whether the sketched problem preserves the least-squares objective or residual quality well enough.Prediction-level viewTheorems here ask whether the sketched estimator predicts as well under a statistical model with noise.Key takeawayResidual preservation can look strong before prediction preservation does. That gap is the heart of the paper.
8 Evidence Audit
The paper’s evidence comes in two forms.
First, theorem-level comparisons explain when sketched least squares preserves residual behavior well and when it does not preserve prediction quality as strongly.
Second, experiments compare sketching schemes and sample sizes. Those experiments matter because they make the split between optimization approximation and statistical prediction visible.
9 What To Reproduce
A good reproduction target is:
- generate a synthetic Gaussian design matrix \(X\)
- generate responses from a noisy linear model
- compare exact least squares with sketched least squares for increasing sketch size
- plot both residual error and prediction error
If your plots show the two error criteria behaving differently, then you are reproducing the paper’s conceptual heart rather than just one figure.
10 Context And Lineage
This paper sits between two traditions:
- classical numerical linear algebra, which cares about preserving the least-squares solve efficiently
- modern statistical learning theory, which cares about prediction and generalization under noise
That is why it is such a useful bridge paper for this site.
11 Transfer And Limits
What transfers:
- the projection view of least squares
- the importance of preserving subspace geometry
- the need to declare the right success metric before reading a theorem
What does not transfer automatically:
- guarantees for constrained least squares
- guarantees for overparameterized interpolation
- conclusions about robustness, privacy, or nonlinear regression
12 What Has Changed Since Publication
Direct sketching follow-ups include:
- constrained or iterative sketching, such as Iterative Hessian Sketch
- newer distributed and memory-constrained sketching regimes, such as Distributed Least Squares in Small Space via Sketching and Bias Reduction
Neighboring least-squares directions include:
- overparameterized regression and minimum-norm interpolation, such as The Implicit Bias of Benign Overfitting
So the paper still works as a bridge, but it should be read as a sketching paper first and a gateway to adjacent least-squares theory second.
13 Resource Kit
- A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares -
Paper bridge- the anchor reading for this lab. Checked2026-04-24. - Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares -
Paper bridge- good follow-up if you want stronger optimization-oriented guarantees. Checked2026-04-24. - Distributed Least Squares in Small Space via Sketching and Bias Reduction -
Paper bridge- useful current example of the same idea under modern distributed-memory constraints. Checked2026-04-24.