PCA Through SVD

A concrete application page showing how the singular value decomposition becomes principal component analysis.
Modified

April 26, 2026

Keywords

application, pca, svd, dimensionality reduction

1 Application Snapshot

Principal component analysis is the cleanest applied face of low-rank approximation.

On a centered data matrix, PCA asks for the best low-dimensional linear representation of the data. SVD answers that question directly: the top right singular vectors are the principal directions, and the truncated SVD gives the best rank-\(k\) reconstruction.

2 Problem Setting

Suppose a centered data matrix

\[ X \in \mathbb{R}^{m \times d} \]

has \(m\) samples and \(d\) features.

We want a lower-dimensional representation that keeps as much variation as possible while replacing \(X\) by a rank-\(k\) matrix.

The SVD of \(X\) is

\[ X = U \Sigma V^\top. \]

Then:

  • the columns of \(V\) are the principal directions

  • the matrix \(U\Sigma\) contains the principal component scores

  • the truncated matrix

    \[ X_k = U_k \Sigma_k V_k^\top \]

    is the best rank-\(k\) approximation to \(X\)

3 Why This Math Appears

There are two equivalent ways to say what PCA is doing.

One is variance language:

find directions that capture as much variation as possible.

The other is approximation language:

find the rank-\(k\) matrix closest to the data matrix.

SVD makes these two statements the same theorem.

Because

\[ X^\top X = V \Sigma^2 V^\top, \]

the principal directions are exactly the eigenvectors of the covariance-type matrix \(X^\top X\).

Because of Eckart-Young, the same singular vectors also solve the best rank-\(k\) reconstruction problem.

4 Math Objects In Use

  • centered data matrix \(X\)
  • right singular vectors \(v_i\)
  • singular values \(\sigma_i\)
  • rank-\(k\) reconstruction \(X_k\)
  • explained-variance ratios
  • projection of data rows onto the top singular directions

5 Worked Walkthrough

Take the centered matrix

\[ X = \begin{bmatrix} -2 & -1.1 \\ -1 & -0.4 \\ 1 & 0.6 \\ 2 & 0.9 \end{bmatrix}. \]

Numerically, its singular values are approximately

\[ \sigma_1 \approx 3.5367, \qquad \sigma_2 \approx 0.1788. \]

The top right singular vector is approximately

\[ v_1 \approx \begin{bmatrix} 0.8939 \\ 0.4484 \end{bmatrix}. \]

That vector is the first principal direction.

The large gap between \(\sigma_1\) and \(\sigma_2\) tells you the data are close to one-dimensional.

Keeping only the first singular direction gives the rank-\(1\) approximation

\[ X_1 = U_1 \Sigma_1 V_1^\top \approx \begin{bmatrix} -2.0388 & -1.0227 \\ -0.9593 & -0.4812 \\ 1.0394 & 0.5214 \\ 1.9586 & 0.9825 \end{bmatrix}. \]

So each data row is replaced by its projection onto the principal line.

Because the discarded singular value is small, the reconstruction is already close to the original data.

6 Implementation or Computation Note

Three practical points matter a lot:

  • center first: without centering, PCA can mostly track the mean direction rather than variation
  • scale thoughtfully: when features use very different units, standardization can matter as much as centering
  • compute stably: modern software often uses SVD directly instead of forming the covariance matrix explicitly, especially in high dimensions

This is a good example of theory helping implementation. The math says covariance eigenvectors and SVD singular vectors agree, but the numerical route you choose still matters.

7 Failure Modes

  • uncentered data: the principal directions can become misleading
  • unscaled features: a large-unit feature can dominate the singular spectrum
  • slow spectral decay: if many singular values are similar, low-rank compression loses much more information
  • semantic over-interpretation: a principal component is a dominant variance direction, not automatically a causal or meaningful factor

8 Paper Bridge

9 Try It

  1. Replace the second feature by 10 times its current value and recompute the top principal direction. How much of the change is scaling rather than structure?
  2. Compute both \(X^\top X\) and the SVD of \(X\) in software. Verify that the leading eigenvector and leading right singular vector agree up to sign.
  3. Compare \(X_1\) with the original \(X\) row by row and interpret the reconstruction error geometrically.

10 Sources and Further Reading

Back to top