PCA Through SVD
application, pca, svd, dimensionality reduction
1 Application Snapshot
Principal component analysis is the cleanest applied face of low-rank approximation.
On a centered data matrix, PCA asks for the best low-dimensional linear representation of the data. SVD answers that question directly: the top right singular vectors are the principal directions, and the truncated SVD gives the best rank-\(k\) reconstruction.
2 Problem Setting
Suppose a centered data matrix
\[ X \in \mathbb{R}^{m \times d} \]
has \(m\) samples and \(d\) features.
We want a lower-dimensional representation that keeps as much variation as possible while replacing \(X\) by a rank-\(k\) matrix.
The SVD of \(X\) is
\[ X = U \Sigma V^\top. \]
Then:
the columns of \(V\) are the principal directions
the matrix \(U\Sigma\) contains the principal component scores
the truncated matrix
\[ X_k = U_k \Sigma_k V_k^\top \]
is the best rank-\(k\) approximation to \(X\)
3 Why This Math Appears
There are two equivalent ways to say what PCA is doing.
One is variance language:
find directions that capture as much variation as possible.
The other is approximation language:
find the rank-\(k\) matrix closest to the data matrix.
SVD makes these two statements the same theorem.
Because
\[ X^\top X = V \Sigma^2 V^\top, \]
the principal directions are exactly the eigenvectors of the covariance-type matrix \(X^\top X\).
Because of Eckart-Young, the same singular vectors also solve the best rank-\(k\) reconstruction problem.
4 Math Objects In Use
- centered data matrix \(X\)
- right singular vectors \(v_i\)
- singular values \(\sigma_i\)
- rank-\(k\) reconstruction \(X_k\)
- explained-variance ratios
- projection of data rows onto the top singular directions
5 Worked Walkthrough
Take the centered matrix
\[ X = \begin{bmatrix} -2 & -1.1 \\ -1 & -0.4 \\ 1 & 0.6 \\ 2 & 0.9 \end{bmatrix}. \]
Numerically, its singular values are approximately
\[ \sigma_1 \approx 3.5367, \qquad \sigma_2 \approx 0.1788. \]
The top right singular vector is approximately
\[ v_1 \approx \begin{bmatrix} 0.8939 \\ 0.4484 \end{bmatrix}. \]
That vector is the first principal direction.
The large gap between \(\sigma_1\) and \(\sigma_2\) tells you the data are close to one-dimensional.
Keeping only the first singular direction gives the rank-\(1\) approximation
\[ X_1 = U_1 \Sigma_1 V_1^\top \approx \begin{bmatrix} -2.0388 & -1.0227 \\ -0.9593 & -0.4812 \\ 1.0394 & 0.5214 \\ 1.9586 & 0.9825 \end{bmatrix}. \]
So each data row is replaced by its projection onto the principal line.
Because the discarded singular value is small, the reconstruction is already close to the original data.
6 Implementation or Computation Note
Three practical points matter a lot:
center first: without centering, PCA can mostly track the mean direction rather than variationscale thoughtfully: when features use very different units, standardization can matter as much as centeringcompute stably: modern software often uses SVD directly instead of forming the covariance matrix explicitly, especially in high dimensions
This is a good example of theory helping implementation. The math says covariance eigenvectors and SVD singular vectors agree, but the numerical route you choose still matters.
7 Failure Modes
uncentered data: the principal directions can become misleadingunscaled features: a large-unit feature can dominate the singular spectrumslow spectral decay: if many singular values are similar, low-rank compression loses much more informationsemantic over-interpretation: a principal component is a dominant variance direction, not automatically a causal or meaningful factor
8 Paper Bridge
- CS168 Lecture 9: The Singular Value Decomposition and Low-Rank Matrix Approximations -
Second pass- a short continuation from PCA into denoising, compression, and missing-data intuition. - Low-rank approximations -
Paper bridge- shows the same truncated-SVD idea in latent semantic indexing and text retrieval. - Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions -
Paper bridge- a natural next step when exact PCA-style low-rank approximation becomes too expensive.
9 Try It
- Replace the second feature by
10times its current value and recompute the top principal direction. How much of the change is scaling rather than structure? - Compute both \(X^\top X\) and the
SVDof \(X\) in software. Verify that the leading eigenvector and leading right singular vector agree up to sign. - Compare \(X_1\) with the original \(X\) row by row and interpret the reconstruction error geometrically.
10 Sources and Further Reading
- Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares -
First pass- official online book with a strong PCA-and-computation perspective. Checked2026-04-24. - CS168 Lecture 9: The Singular Value Decomposition and Low-Rank Matrix Approximations -
Second pass- clear application framing for compression and missing-data intuition. Checked2026-04-24. - MIT 18.06 Lecture 29: Singular value decomposition -
Second pass- good official reinforcement on the linear algebra side. Checked2026-04-24. - Low-rank approximations -
Paper bridge- concrete bridge from PCA-style SVD to text and latent-semantic applications. Checked2026-04-24. - Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions -
Paper bridge- bridge from exact PCA/SVD to scalable approximate low-rank methods. Checked2026-04-24.