Exercises: Orthogonality and Least Squares
exercises, least squares, projection
1 Scope and Goals
This set trains four things:
- recognizing orthogonality quickly
- computing least-squares solutions by hand on small problems
- interpreting residual conditions correctly
- moving between geometry, algebra, and regression language
2 Prerequisites
- dot products and orthogonality
- column spaces
- matrix multiplication and transpose
- the normal equations
3 Warm-Up Problems
- Let
u = (1,1,0)andv = (1,-1,2). Are \(u\) and \(v\) orthogonal? - Let \(W = \operatorname{span}\{(1,1,1)\}\) and
b = (2,0,1). Find the orthogonal projection of \(b\) onto \(W\).
4 Core Problems
For
\[ A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix}, \qquad b = \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}, \]
compute \(A^\top A\), \(A^\top b\), and the least-squares solution \(\hat{x}\).
Using the same \(A\) and \(b\), compute the residual \(r = b - A\hat{x}\) and verify that \(r\) is orthogonal to each column of \(A\).
Suppose \(A\) has orthonormal columns. Show that the least-squares solution is \(\hat{x} = A^\top b\).
5 Proof Problems
- Prove that if \(r = b - A\hat{x}\) is orthogonal to \(\operatorname{col}(A)\), then \(\hat{x}\) is a least-squares minimizer.
- Show that if the columns of \(A\) are linearly independent, then \(A^\top A\) is positive definite.
6 Computational or Applied Problems
- Open Computation Lab: Projection Geometry and Regression Residuals. First perturb only the intercept while keeping the slope at its least-squares value. Then perturb only the slope while keeping the intercept at its least-squares value. Record how the
SSE, residual sum, and weighted residual sum change in each case. - In software, solve a small full-rank least-squares problem twice: once through the normal equations and once through
QR. Compare the coefficient vectors and describe whyQRis usually the safer route numerically.
7 Hints
- For a one-dimensional subspace spanned by \(u\), project with \(\frac{u^\top b}{u^\top u}u\).
- In the \(A^\top r = 0\) check, use each column of \(A\) separately.
- If the columns are orthonormal, ask what \(A^\top A\) becomes.
- For positive definiteness, test \(v^\top A^\top A v\).
8 Full Solutions
8.1 Solution 1
\[ u \cdot v = 1 \cdot 1 + 1 \cdot (-1) + 0 \cdot 2 = 0. \]
So \(u\) and \(v\) are orthogonal.
8.2 Solution 2
Let \(w = (1,1,1)\). Then
\[ \operatorname{proj}_W(b) = \frac{w^\top b}{w^\top w} w = \frac{2+0+1}{3}(1,1,1) = (1,1,1). \]
8.3 Solution 3
We have
\[ A^\top A = \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix}, \qquad A^\top b = \begin{bmatrix} 5 \\ 6 \end{bmatrix}. \]
Solving
\[ \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix} \hat{x} = \begin{bmatrix} 5 \\ 6 \end{bmatrix} \]
gives
\[ \hat{x} = \begin{bmatrix} 7/6 \\ 1/2 \end{bmatrix}. \]
8.4 Solution 4
The fitted vector is
\[ A\hat{x} = \begin{bmatrix} 7/6 \\ 5/3 \\ 13/6 \end{bmatrix}, \]
so
\[ r = \begin{bmatrix} -1/6 \\ 1/3 \\ -1/6 \end{bmatrix}. \]
Check against the first column (1,1,1):
\[ (-1/6) + (1/3) + (-1/6) = 0. \]
Check against the second column (0,1,2):
\[ 0(-1/6) + 1(1/3) + 2(-1/6) = 0. \]
Hence \(r\) is orthogonal to \(\operatorname{col}(A)\).
8.5 Solution 5
If the columns of \(A\) are orthonormal, then \(A^\top A = I\). The normal equations become
\[ \hat{x} = A^\top b. \]
So the coefficients are obtained by taking dot products with the orthonormal columns.
8.6 Solution 6
For any \(h\),
\[ \|A(\hat{x}+h)-b\|_2^2 = \|(-r)+Ah\|_2^2 = \|r\|_2^2 + \|Ah\|_2^2 - 2r^\top Ah. \]
If \(r \perp \operatorname{col}(A)\), then \(r^\top Ah = 0\), so
\[ \|A(\hat{x}+h)-b\|_2^2 = \|r\|_2^2 + \|Ah\|_2^2 \ge \|r\|_2^2. \]
Thus \(\hat{x}\) minimizes the objective.
8.7 Solution 7
For any nonzero \(v\),
\[ v^\top A^\top A v = (Av)^\top (Av) = \|Av\|_2^2. \]
If the columns of \(A\) are linearly independent, then \(Av \neq 0\) whenever \(v \neq 0\), so
\[ v^\top A^\top A v > 0. \]
Therefore \(A^\top A\) is positive definite.
8.8 Solution 8
If you perturb only the intercept, the residual sum changes immediately because the intercept column is the all-ones direction. If you perturb only the slope, the weighted residual sum changes immediately because the slope column is the feature direction \((0,1,2)^\top\).
In both cases, the SSE increases once you move away from the optimum. The lesson is not that one condition always fails “first” in every direction, but that each parameter perturbs the orthogonality condition attached to its own model direction.
8.9 Solution 9
On a small, well-conditioned problem, the two methods may agree closely. The important point is structural: QR works with orthogonal factors and avoids explicitly squaring the conditioning of the problem, while the normal equations build \(A^\top A\), which can magnify numerical error.
8.10 Solution 10
An intercept column is the all-ones vector. Since the residual is orthogonal to every column of the design matrix, it is orthogonal to the ones vector. Therefore
\[ \sum_i r_i = 0. \]
9 Common Errors
- forgetting whether the projection target is a vector or a subspace
- checking orthogonality against only one column instead of the whole column space
- mixing up \(A^\top A\) and \(AA^\top\)
- treating \((A^\top A)^{-1}A^\top b\) as valid without checking rank assumptions
10 What To Do Next
11 Sources and Further Reading
- MIT 18.06SC Linear Algebra resource index -
First pass- good official problem-solving sequence for projections and least squares. Checked2026-04-24. - Hefferon, Linear Algebra -
Second pass- especially useful for extra exercises and worked solutions. Checked2026-04-24. - A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares -
Paper bridge- a reminder that these same exercises become modern large-scale regression questions. Checked2026-04-24.