Computation Lab: Rank-1 Approximation and PCA Geometry
An interactive lab for seeing how singular values, principal directions, and rank-1 reconstruction fit together.
Keywords
computation, simulation, visualization, svd, pca
1 Lab Goal
This lab helps you see one specific fact:
in a two-feature centered data matrix, the rank-\(1\) truncated SVD is just projection onto the top principal direction.
2 Math Question
How do noise level and retained rank affect:
- the singular values
- the top principal direction
- the rank-\(1\) reconstruction
- the Frobenius reconstruction error
3 Model or Setup
We start from a deterministic point cloud that is almost one-dimensional.
The points are centered feature vectors in \(\mathbb{R}^2\). When the noise scale is small, the data matrix is close to rank \(1\); when the noise scale grows, the second singular value grows too.
4 Parameters and Controls
Noise scale: controls how far the cloud deviates from an exact one-dimensional patternRetained rank: choose between a rank-\(1\) reconstruction and the full rank-\(2\) reconstruction
Default values are 0.60 for noise scale and 1 for retained rank.
5 Code and Simulation
6 What To Observe
- With
Noise scale = 0, the point cloud is exactly rank \(1\), so the second singular value and the rank-\(1\) reconstruction error both collapse to zero up to floating-point roundoff. - When
Retained rank = 1, the orange reconstruction points lie on the principal line. - As the noise scale increases, the reconstruction segments grow and the second singular value grows with them.
- When
Retained rank = 2, the reconstruction becomes exact and the Frobenius error drops to zero.
7 Interpretation
This lab is the two-feature version of truncated SVD.
The centered data matrix \(X\) is being approximated by \(X_1\), the best rank-\(1\) matrix. In geometry, that means every data row is projected onto the top principal direction.
In this setup, the Frobenius reconstruction error for the rank-\(1\) approximation matches the discarded singular value because there are only two singular values total:
\[ \|X-X_1\|_F = \sigma_2. \]
To keep the displayed value numerically stable at the exact rank-\(1\) endpoint, the lab reports the second singular value through this reconstruction error identity rather than from a cancellation-prone closed-form subtraction.
So the picture, the table, and the displayed error are all different views of the same theorem.
8 Failure Modes and Numerical Cautions
- This is a tiny synthetic example, so it hides scaling and conditioning issues that appear in larger data sets.
- The principal direction is only defined up to sign, so software may return \(v\) or \(-v\).
- The lab is about best linear reconstruction, not causal interpretation.
- Centering is built in here; on raw data, leaving out centering can seriously change the result.
9 Reproducibility Notes
- execution engine:
Observable JS - no randomness and no seed required
- deterministic point construction from fixed arrays
- static-site friendly: no server runtime or notebook kernel required after render
10 Extensions
- add a third feature and compare rank-\(1\) versus rank-\(2\) reconstructions
- replace the synthetic point cloud with a small image-like matrix and study compression directly
- connect the same geometry to PCA Through SVD and then to randomized low-rank approximation
11 Sources and Further Reading
- Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares -
First pass- good anchor for PCA geometry and computation. Checked2026-04-24. - CS168 Lecture 9: The Singular Value Decomposition and Low-Rank Matrix Approximations -
Second pass- strong application framing for low-rank reconstruction. Checked2026-04-24. - Randomized Numerical Linear Algebra: Foundations & Algorithms -
Paper bridge- modern context for why low-rank approximation matters computationally at scale. Checked2026-04-24.