SVD and Low-Rank Approximation
singular value decomposition, low-rank approximation, truncated svd, eckart-young, spectral methods
1 Role
This page is the bridge from basic matrix algebra to spectral thinking.
The singular value decomposition tells you how a matrix rotates coordinates, stretches along orthogonal directions, and suppresses weaker directions. Low-rank approximation then turns that structure into one of the main tools for compression, dimensionality reduction, denoising, and large-scale learning.
2 First-Pass Promise
Read this page after Eigenvalues and Diagonalization.
If you stop here, you should still understand:
- what the SVD factorization is saying geometrically
- what truncated SVD keeps and discards
- why Eckart-Young makes low-rank approximation precise
- how the topic connects to PCA and stable computation
3 Why It Matters
This topic matters because one matrix factorization supports several different jobs at once:
- a geometric statement: every matrix has orthogonal input and output directions
- an approximation principle: the strongest \(k\) singular directions give the best rank-\(k\) approximation
- a computational viewpoint:
SVDis a stable way to analyze ill-conditioned or rank-deficient problems - a research bridge: randomized low-rank methods, streaming approximation, and model adaptation all reuse this language
4 Prerequisite Recall
- an orthogonal matrix preserves dot products and Euclidean norms
- the matrix \(A^\top A\) is symmetric positive semidefinite
- rank counts the number of independent directions carried by a matrix
- orthogonal projection explains best approximation inside a subspace
- Orthogonality and Least Squares already showed one best-approximation problem inside linear algebra
5 Intuition
SVD says a matrix can be understood as three simpler moves:
- a rotation or change of coordinates
- a stretching along orthogonal directions
- another rotation
That turns an opaque matrix into geometry you can reason about.
There is a second intuition that matters just as much: the outer-product expansion
\[ A = \sum_{i=1}^r \sigma_i u_i v_i^\top \]
writes the matrix as a sum of rank-\(1\) ingredients, ordered from most important to least important by the singular values \(\sigma_i\).
Low-rank approximation says:
keep the strongest ingredients and drop the weaker ones.
When the singular values decay quickly, that simple rule preserves most of the useful structure while reducing dimension, storage, or noise.
6 Formal Core
Definition 1 (Definition) Let \(A \in \mathbb{R}^{m \times n}\) have rank \(r\).
A thin singular value decomposition of \(A\) is
\[ A = U_r \Sigma_r V_r^\top = \sum_{i=1}^r \sigma_i u_i v_i^\top, \]
where:
- \(U_r \in \mathbb{R}^{m \times r}\) has orthonormal columns
- \(V_r \in \mathbb{R}^{n \times r}\) has orthonormal columns
- \(\Sigma_r = \operatorname{diag}(\sigma_1,\dots,\sigma_r)\) with \(\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0\)
The vectors \(u_i\) are the left singular vectors, the vectors \(v_i\) are the right singular vectors, and the \(\sigma_i\) are the singular values.
The singular values are tightly linked to symmetric matrices:
\[ A^\top A = V_r \Sigma_r^2 V_r^\top, \qquad AA^\top = U_r \Sigma_r^2 U_r^\top. \]
So the right singular vectors are eigenvectors of \(A^\top A\), the left singular vectors are eigenvectors of \(AA^\top\), and the singular values are the square roots of the nonzero eigenvalues.
Definition 2 (Truncated SVD) For \(1 \le k < r\), the rank-\(k\) truncated SVD is
\[ A_k = \sum_{i=1}^k \sigma_i u_i v_i^\top. \]
This keeps the top \(k\) singular directions and discards the rest.
Theorem 1 (Eckart-Young Theorem) For every matrix \(B\) with \(\operatorname{rank}(B) \le k\),
\[ \|A - A_k\|_F \le \|A - B\|_F \qquad\text{and}\qquad \|A - A_k\|_2 \le \|A - B\|_2. \]
Moreover,
\[ \|A - A_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2, \qquad \|A - A_k\|_2 = \sigma_{k+1}. \]
So the truncated SVD is the best rank-\(k\) approximation in both Frobenius and spectral norm.
7 Worked Example
Take the diagonal matrix
\[ A = \begin{bmatrix} 4 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \]
Its singular values are already visible:
\[ \sigma_1 = 4,\qquad \sigma_2 = 2,\qquad \sigma_3 = 1. \]
The singular vectors are the standard basis vectors, so the SVD is simply
\[ A = I \operatorname{diag}(4,2,1) I^\top. \]
The best rank-\(1\) approximation keeps only the top singular direction:
\[ A_1 = \begin{bmatrix} 4 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. \]
The best rank-\(2\) approximation keeps the top two:
\[ A_2 = \begin{bmatrix} 4 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \end{bmatrix}. \]
The approximation errors are:
\[ \|A-A_1\|_F = \sqrt{2^2 + 1^2} = \sqrt{5}, \qquad \|A-A_1\|_2 = 2, \]
and
\[ \|A-A_2\|_F = 1, \qquad \|A-A_2\|_2 = 1. \]
This example is diagonal, so it looks unusually clean. But that is exactly the point of SVD: every matrix can be rotated into this ordered diagonal picture.
8 Computation Lens
SVD is not only a theorem. It is also a computational tool.
Three computational ideas matter immediately:
thin or economy SVD: keep only the nonzero singular directions instead of a full square decompositiontruncated SVD: use only the top \(k\) components when storage or approximation quality matters more than exactnessnumerical rank: in floating-point settings, very small singular values often behave like noise or near-dependence
SVD also explains why it is safer than the normal equations in hard least-squares problems: it exposes tiny singular values directly instead of hiding them inside \(A^\top A\).
On large matrices, exact SVD can be too expensive. That is where randomized low-rank approximation, sketching, and streaming methods enter.
9 Application Lens
This topic powers several concrete workflows:
PCA: principal directions are the right singular vectors of the centered data matrixcompression: keep only the large singular values and store a smaller factorizationdenoising: discard weak directions that mostly carry noiseleast squares: use singular values to see rank deficiency and compute pseudoinverse-based solutionsmodel adaptation: low-rank updates can capture useful changes with far fewer parameters
For a concrete data-analysis example, see PCA Through SVD. For an interactive picture of rank-\(1\) reconstruction, see Computation Lab: Rank-1 Approximation and PCA Geometry.
10 Common Mistakes
- confusing
eigenvectors of Awithsingular vectors of A - thinking low rank means “small entries” instead of “few dominant directions”
- forgetting that
PCAuses theSVDof a centered data matrix, not raw data by default - assuming a tiny singular value is automatically zero rather than a sign of numerical or modeling near-dependence
- treating truncated SVD as a universal best approximation without naming the norm being optimized
11 Exercises
- Write the thin SVD and best rank-\(1\) approximation of \(\operatorname{diag}(5,2)\).
- Show that if \(A\) has orthonormal columns, then all of its singular values are \(1\).
- Explain in one paragraph why
SVDis the right tool to understand bothPCAand ill-conditioned least squares.
12 Stop Here For First Pass
If you can now explain:
- how SVD decomposes a matrix into orthogonal directions and scales
- why truncated SVD is the best rank-\(k\) approximation
- how the worked example connects to PCA and numerics
then this page has done its main job.
13 Go Deeper
Open one of these only if you want a different learning mode:
- Eckart-Young and Truncated SVD for the proof
- PCA Through SVD for the cleanest application framing
- Computation Lab: Rank-1 Approximation and PCA Geometry for visual and numerical intuition
14 Optional Paper Bridge
- Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions -
Paper bridge- shows how randomized algorithms recover near-optimal low-rank structure without computing a full exact SVD. - Randomized Numerical Linear Algebra: Foundations & Algorithms -
Second pass- broad modern survey of low-rank approximation, regression, streaming, and approximation guarantees. - LoRA: Low-Rank Adaptation of Large Language Models -
Paper bridge- modern example where low-rank structure becomes a practical modeling tool rather than only a matrix-compression tool.
15 Optional After First Pass
These are optional after the first pass:
16 Sources and Further Reading
- MIT 18.06 Lecture 29: Singular value decomposition -
First pass- strong official first exposure to SVD and its geometric meaning. Checked2026-04-24. - Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares -
First pass- official applied linear algebra book with a clear computational lens. Checked2026-04-24. - Hefferon, Linear Algebra -
Second pass- helpful for motivated proofs, exercises, and independent study. Checked2026-04-24. - CS168 Lecture 9: The Singular Value Decomposition and Low-Rank Matrix Approximations -
Second pass- unusually good for applications like denoising, matrix completion, and model updates. Checked2026-04-24. - Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions -
Paper bridge- classic randomized low-rank approximation bridge paper. Checked2026-04-24.