Eckart-Young and Truncated SVD
proof, svd, low-rank approximation, eckart-young
1 Claim
We want the theorem behind the slogan
keep the top singular directions and you get the best low-rank matrix.
Theorem 1 (Theorem) Let
\[ A = \sum_{i=1}^r \sigma_i u_i v_i^\top \]
be the thin SVD of \(A \in \mathbb{R}^{m \times n}\), with \(\sigma_1 \ge \cdots \ge \sigma_r > 0\).
For \(1 \le k < r\), define the truncated SVD
\[ A_k = \sum_{i=1}^k \sigma_i u_i v_i^\top. \]
Then for every matrix \(B\) with \(\operatorname{rank}(B) \le k\),
\[ \|A-A_k\|_F \le \|A-B\|_F. \]
Moreover,
\[ \|A-A_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2. \]
2 Why This Proof Matters
This proof is worth learning because it keeps reappearing in disguised form:
PCAis the same theorem applied to a centered data matrix- randomized low-rank algorithms are measured against the truncated SVD benchmark
- matrix denoising and compression arguments often reduce to the same rank-\(k\) comparison
- later spectral proofs reuse the same pattern of rotating into the singular-vector basis
3 Dependencies
- orthogonal matrices preserve Frobenius norm
- singular vectors give orthonormal bases
- orthogonal projection onto a \(k\)-dimensional subspace maximizes captured energy when the subspace aligns with dominant singular directions
- for a matrix \(M\), \(\|M\|_F^2\) is the sum of squares of its entries
4 Strategy Before Details
The proof has two steps.
- Show that \(A_k\) leaves behind exactly the tail singular values, so its error is easy to compute.
- Show that every other rank-\(k\) matrix must miss at least that much energy.
The key move is to compare any candidate matrix \(B\) through the projection onto its column space.
5 Full Proof
Proof. First compute the error of the truncated SVD itself.
Because the outer products \(u_i v_i^\top\) are orthonormal under the Frobenius inner product,
\[ A - A_k = \sum_{i=k+1}^r \sigma_i u_i v_i^\top \]
and therefore
\[ \|A-A_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2. \]
Now let \(B\) be any matrix with \(\operatorname{rank}(B) \le k\). Let \(P\) be the orthogonal projection onto \(\operatorname{col}(B)\). Since every column of \(B\) lies in \(\operatorname{col}(B)\), we have \(PB = B\).
Write
\[ A-B = (I-P)A + (PA-B). \]
The first term lies in \(\operatorname{col}(B)^\perp\), while the second lies in \(\operatorname{col}(B)\). So they are orthogonal, and the Pythagorean theorem gives
\[ \|A-B\|_F^2 = \|(I-P)A\|_F^2 + \|PA-B\|_F^2 \ge \|(I-P)A\|_F^2. \]
Thus it is enough to lower-bound \(\|(I-P)A\|_F^2\).
Since \(P\) is an orthogonal projection,
\[ \|(I-P)A\|_F^2 = \|A\|_F^2 - \|PA\|_F^2. \]
Let \(s = \operatorname{rank}(B)\), so \(s \le k\), and let \(q_1,\dots,q_s\) be an orthonormal basis for \(\operatorname{col}(B)\). Then
\[ \|PA\|_F^2 = \sum_{j=1}^s \|q_j^\top A\|_2^2 = \sum_{j=1}^s q_j^\top A A^\top q_j. \]
Using \(AA^\top = U_r \Sigma_r^2 U_r^\top\), write \(q_j = U_r z_j + w_j\), where \(w_j\) is orthogonal to \(\operatorname{col}(U_r)\). Then
\[ q_j^\top A A^\top q_j = z_j^\top \Sigma_r^2 z_j. \]
So
\[ \|PA\|_F^2 = \sum_{j=1}^s z_j^\top \Sigma_r^2 z_j = \sum_{i=1}^r \sigma_i^2 \alpha_i, \]
where
\[ \alpha_i = \sum_{j=1}^s z_{j,i}^2. \]
Each \(\alpha_i\) lies in \([0,1]\), and because the \(q_j\) are orthonormal,
\[ \sum_{i=1}^r \alpha_i = \sum_{j=1}^s \|z_j\|_2^2 \le s \le k. \]
Therefore \(\sum_{i=1}^r \sigma_i^2 \alpha_i\) is at most \(\sum_{i=1}^k \sigma_i^2\), because the total mass is at most \(k\), each coefficient is at most \(1\), and the singular values are ordered from largest to smallest. Hence
\[ \|PA\|_F^2 \le \sum_{i=1}^k \sigma_i^2. \]
Substituting into the previous bound,
\[ \|A-B\|_F^2 \ge \|A\|_F^2 - \sum_{i=1}^k \sigma_i^2 = \sum_{i=1}^r \sigma_i^2 - \sum_{i=1}^k \sigma_i^2 = \sum_{i=k+1}^r \sigma_i^2 = \|A-A_k\|_F^2. \]
So \(A_k\) is a best rank-\(k\) approximation in Frobenius norm.
6 Step Annotations
- The tail formula for \(A-A_k\) works because the rank-\(1\) SVD pieces are orthogonal in Frobenius inner product.
- Replacing \(B\) by projection onto its column space is the key simplification: the best rank-\(k\) matrix must live inside some \(k\)-dimensional output subspace.
- The proof becomes an energy-capture question: how much of \(A\) can any \(k\)-dimensional subspace retain?
- The ordering of the singular values forces the best subspace to align with the first \(k\) left singular vectors.
7 Why The Assumptions Matter
- We used Frobenius norm, which behaves well with orthogonality and Pythagorean decompositions.
- The ordered singular values matter because “top \(k\)” only makes sense after sorting.
- The same conclusion in spectral norm is true, but its proof needs a different argument.
8 Common Failure Modes
- writing \(A_k\) correctly but forgetting to justify why every other rank-\(k\) matrix can be compared through its column space
- using the projection matrix \(P\) without noting that \(PB = B\)
- assuming the best subspace result without explaining why the dominant singular directions maximize captured energy
- claiming the spectral-norm statement was proved here when this page only proved the Frobenius-norm version
9 Reusable Pattern
This proof teaches a pattern that returns later:
- express the target object in an orthogonal basis
- reduce the approximation problem to a projection question
- bound how much energy any low-dimensional subspace can capture
- identify the dominant spectral directions as the optimizer
10 Where This Shows Up Again
11 Exercises
- Show directly that \(\|A-A_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2\) using the outer-product expansion.
- Identify exactly where the proof uses the ordering \(\sigma_1 \ge \cdots \ge \sigma_r\).
- Explain why the proof naturally suggests
PCAwhen \(A\) is a centered data matrix.
12 Sources and Further Reading
- MIT 18.06 Lecture 29: Singular value decomposition -
First pass- strong official anchor for the theorem and its geometry. Checked2026-04-24. - Hefferon, Linear Algebra -
Second pass- useful for proof structure and independent exercises. Checked2026-04-24. - Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares -
Second pass- a good source for the applied interpretation around rank reduction and approximation. Checked2026-04-24. - Randomized Numerical Linear Algebra: Foundations & Algorithms -
Paper bridge- modern survey showing how the truncated SVD theorem becomes the benchmark for approximate algorithms. Checked2026-04-24.