Exercises: Orthogonality and Least Squares

Practice problems and worked solutions for projections, residual orthogonality, and linear least squares.

Modified

April 26, 2026

Keywords

exercises, least squares, projection

1 Scope and Goals

This set trains four things:

recognizing orthogonality quickly
computing least-squares solutions by hand on small problems
interpreting residual conditions correctly
moving between geometry, algebra, and regression language

2 Prerequisites

dot products and orthogonality
column spaces
matrix multiplication and transpose
the normal equations

3 Warm-Up Problems

Let u = (1,1,0) and v = (1,-1,2). Are \(u\) and \(v\) orthogonal?
Let \(W = \operatorname{span}\{(1,1,1)\}\) and b = (2,0,1). Find the orthogonal projection of \(b\) onto \(W\).

4 Core Problems

For

\[ A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix}, \qquad b = \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}, \]

compute \(A^\top A\), \(A^\top b\), and the least-squares solution \(\hat{x}\).
Using the same \(A\) and \(b\), compute the residual \(r = b - A\hat{x}\) and verify that \(r\) is orthogonal to each column of \(A\).
Suppose \(A\) has orthonormal columns. Show that the least-squares solution is \(\hat{x} = A^\top b\).

5 Proof Problems

Prove that if \(r = b - A\hat{x}\) is orthogonal to \(\operatorname{col}(A)\), then \(\hat{x}\) is a least-squares minimizer.
Show that if the columns of \(A\) are linearly independent, then \(A^\top A\) is positive definite.

6 Computational or Applied Problems

Open Computation Lab: Projection Geometry and Regression Residuals. First perturb only the intercept while keeping the slope at its least-squares value. Then perturb only the slope while keeping the intercept at its least-squares value. Record how the SSE, residual sum, and weighted residual sum change in each case.
In software, solve a small full-rank least-squares problem twice: once through the normal equations and once through QR. Compare the coefficient vectors and describe why QR is usually the safer route numerically.

7 Hints

For a one-dimensional subspace spanned by \(u\), project with \(\frac{u^\top b}{u^\top u}u\).
In the \(A^\top r = 0\) check, use each column of \(A\) separately.
If the columns are orthonormal, ask what \(A^\top A\) becomes.
For positive definiteness, test \(v^\top A^\top A v\).

8 Full Solutions

8.1 Solution 1

\[ u \cdot v = 1 \cdot 1 + 1 \cdot (-1) + 0 \cdot 2 = 0. \]

So \(u\) and \(v\) are orthogonal.

8.2 Solution 2

Let \(w = (1,1,1)\). Then

\[ \operatorname{proj}_W(b) = \frac{w^\top b}{w^\top w} w = \frac{2+0+1}{3}(1,1,1) = (1,1,1). \]

8.3 Solution 3

We have

\[ A^\top A = \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix}, \qquad A^\top b = \begin{bmatrix} 5 \\ 6 \end{bmatrix}. \]

Solving

\[ \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix} \hat{x} = \begin{bmatrix} 5 \\ 6 \end{bmatrix} \]

gives

\[ \hat{x} = \begin{bmatrix} 7/6 \\ 1/2 \end{bmatrix}. \]

8.4 Solution 4

The fitted vector is

\[ A\hat{x} = \begin{bmatrix} 7/6 \\ 5/3 \\ 13/6 \end{bmatrix}, \]

\[ r = \begin{bmatrix} -1/6 \\ 1/3 \\ -1/6 \end{bmatrix}. \]

Check against the first column (1,1,1):

\[ (-1/6) + (1/3) + (-1/6) = 0. \]

Check against the second column (0,1,2):

\[ 0(-1/6) + 1(1/3) + 2(-1/6) = 0. \]

Hence \(r\) is orthogonal to \(\operatorname{col}(A)\).

8.5 Solution 5

If the columns of \(A\) are orthonormal, then \(A^\top A = I\). The normal equations become

\[ \hat{x} = A^\top b. \]

So the coefficients are obtained by taking dot products with the orthonormal columns.

8.6 Solution 6

For any \(h\),

\[ \|A(\hat{x}+h)-b\|_2^2 = \|(-r)+Ah\|_2^2 = \|r\|_2^2 + \|Ah\|_2^2 - 2r^\top Ah. \]

If \(r \perp \operatorname{col}(A)\), then \(r^\top Ah = 0\), so

\[ \|A(\hat{x}+h)-b\|_2^2 = \|r\|_2^2 + \|Ah\|_2^2 \ge \|r\|_2^2. \]

Thus \(\hat{x}\) minimizes the objective.

8.7 Solution 7

For any nonzero \(v\),

\[ v^\top A^\top A v = (Av)^\top (Av) = \|Av\|_2^2. \]

If the columns of \(A\) are linearly independent, then \(Av \neq 0\) whenever \(v \neq 0\), so

\[ v^\top A^\top A v > 0. \]

Therefore \(A^\top A\) is positive definite.

8.8 Solution 8

If you perturb only the intercept, the residual sum changes immediately because the intercept column is the all-ones direction. If you perturb only the slope, the weighted residual sum changes immediately because the slope column is the feature direction \((0,1,2)^\top\).

In both cases, the SSE increases once you move away from the optimum. The lesson is not that one condition always fails “first” in every direction, but that each parameter perturbs the orthogonality condition attached to its own model direction.

8.9 Solution 9

On a small, well-conditioned problem, the two methods may agree closely. The important point is structural: QR works with orthogonal factors and avoids explicitly squaring the conditioning of the problem, while the normal equations build \(A^\top A\), which can magnify numerical error.

8.10 Solution 10

An intercept column is the all-ones vector. Since the residual is orthogonal to every column of the design matrix, it is orthogonal to the ones vector. Therefore

\[ \sum_i r_i = 0. \]

9 Common Errors

forgetting whether the projection target is a vector or a subspace
checking orthogonality against only one column instead of the whole column space
mixing up \(A^\top A\) and \(AA^\top\)
treating \((A^\top A)^{-1}A^\top b\) as valid without checking rank assumptions

10 What To Do Next

revisit Projection Theorem and Normal Equations
try Computation Lab: Projection Geometry and Regression Residuals
continue to SVD and Low-Rank Approximation

11 Sources and Further Reading

MIT 18.06SC Linear Algebra resource index - First pass - good official problem-solving sequence for projections and least squares. Checked 2026-04-24.
Hefferon, Linear Algebra - Second pass - especially useful for extra exercises and worked solutions. Checked 2026-04-24.
A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares - Paper bridge - a reminder that these same exercises become modern large-scale regression questions. Checked 2026-04-24.