Computation Lab: Rank-1 Approximation and PCA Geometry

An interactive lab for seeing how singular values, principal directions, and rank-1 reconstruction fit together.
Modified

April 26, 2026

Keywords

computation, simulation, visualization, svd, pca

1 Lab Goal

This lab helps you see one specific fact:

in a two-feature centered data matrix, the rank-\(1\) truncated SVD is just projection onto the top principal direction.

2 Math Question

How do noise level and retained rank affect:

  • the singular values
  • the top principal direction
  • the rank-\(1\) reconstruction
  • the Frobenius reconstruction error

3 Model or Setup

We start from a deterministic point cloud that is almost one-dimensional.

The points are centered feature vectors in \(\mathbb{R}^2\). When the noise scale is small, the data matrix is close to rank \(1\); when the noise scale grows, the second singular value grows too.

4 Parameters and Controls

  • Noise scale: controls how far the cloud deviates from an exact one-dimensional pattern
  • Retained rank: choose between a rank-\(1\) reconstruction and the full rank-\(2\) reconstruction

Default values are 0.60 for noise scale and 1 for retained rank.

5 Code and Simulation

6 What To Observe

  • With Noise scale = 0, the point cloud is exactly rank \(1\), so the second singular value and the rank-\(1\) reconstruction error both collapse to zero up to floating-point roundoff.
  • When Retained rank = 1, the orange reconstruction points lie on the principal line.
  • As the noise scale increases, the reconstruction segments grow and the second singular value grows with them.
  • When Retained rank = 2, the reconstruction becomes exact and the Frobenius error drops to zero.

7 Interpretation

This lab is the two-feature version of truncated SVD.

The centered data matrix \(X\) is being approximated by \(X_1\), the best rank-\(1\) matrix. In geometry, that means every data row is projected onto the top principal direction.

In this setup, the Frobenius reconstruction error for the rank-\(1\) approximation matches the discarded singular value because there are only two singular values total:

\[ \|X-X_1\|_F = \sigma_2. \]

To keep the displayed value numerically stable at the exact rank-\(1\) endpoint, the lab reports the second singular value through this reconstruction error identity rather than from a cancellation-prone closed-form subtraction.

So the picture, the table, and the displayed error are all different views of the same theorem.

8 Failure Modes and Numerical Cautions

  • This is a tiny synthetic example, so it hides scaling and conditioning issues that appear in larger data sets.
  • The principal direction is only defined up to sign, so software may return \(v\) or \(-v\).
  • The lab is about best linear reconstruction, not causal interpretation.
  • Centering is built in here; on raw data, leaving out centering can seriously change the result.

9 Reproducibility Notes

  • execution engine: Observable JS
  • no randomness and no seed required
  • deterministic point construction from fixed arrays
  • static-site friendly: no server runtime or notebook kernel required after render

10 Extensions

  • add a third feature and compare rank-\(1\) versus rank-\(2\) reconstructions
  • replace the synthetic point cloud with a small image-like matrix and study compression directly
  • connect the same geometry to PCA Through SVD and then to randomized low-rank approximation

11 Sources and Further Reading

Back to top