Exercises: Orthogonality and Least Squares

Practice problems and worked solutions for projections, residual orthogonality, and linear least squares.
Modified

April 26, 2026

Keywords

exercises, least squares, projection

1 Scope and Goals

This set trains four things:

  • recognizing orthogonality quickly
  • computing least-squares solutions by hand on small problems
  • interpreting residual conditions correctly
  • moving between geometry, algebra, and regression language

2 Prerequisites

  • dot products and orthogonality
  • column spaces
  • matrix multiplication and transpose
  • the normal equations

3 Warm-Up Problems

  1. Let u = (1,1,0) and v = (1,-1,2). Are \(u\) and \(v\) orthogonal?
  2. Let \(W = \operatorname{span}\{(1,1,1)\}\) and b = (2,0,1). Find the orthogonal projection of \(b\) onto \(W\).

4 Core Problems

  1. For

    \[ A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix}, \qquad b = \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix}, \]

    compute \(A^\top A\), \(A^\top b\), and the least-squares solution \(\hat{x}\).

  2. Using the same \(A\) and \(b\), compute the residual \(r = b - A\hat{x}\) and verify that \(r\) is orthogonal to each column of \(A\).

  3. Suppose \(A\) has orthonormal columns. Show that the least-squares solution is \(\hat{x} = A^\top b\).

5 Proof Problems

  1. Prove that if \(r = b - A\hat{x}\) is orthogonal to \(\operatorname{col}(A)\), then \(\hat{x}\) is a least-squares minimizer.
  2. Show that if the columns of \(A\) are linearly independent, then \(A^\top A\) is positive definite.

6 Computational or Applied Problems

  1. Open Computation Lab: Projection Geometry and Regression Residuals. First perturb only the intercept while keeping the slope at its least-squares value. Then perturb only the slope while keeping the intercept at its least-squares value. Record how the SSE, residual sum, and weighted residual sum change in each case.
  2. In software, solve a small full-rank least-squares problem twice: once through the normal equations and once through QR. Compare the coefficient vectors and describe why QR is usually the safer route numerically.

7 Hints

  1. For a one-dimensional subspace spanned by \(u\), project with \(\frac{u^\top b}{u^\top u}u\).
  2. In the \(A^\top r = 0\) check, use each column of \(A\) separately.
  3. If the columns are orthonormal, ask what \(A^\top A\) becomes.
  4. For positive definiteness, test \(v^\top A^\top A v\).

8 Full Solutions

8.1 Solution 1

\[ u \cdot v = 1 \cdot 1 + 1 \cdot (-1) + 0 \cdot 2 = 0. \]

So \(u\) and \(v\) are orthogonal.

8.2 Solution 2

Let \(w = (1,1,1)\). Then

\[ \operatorname{proj}_W(b) = \frac{w^\top b}{w^\top w} w = \frac{2+0+1}{3}(1,1,1) = (1,1,1). \]

8.3 Solution 3

We have

\[ A^\top A = \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix}, \qquad A^\top b = \begin{bmatrix} 5 \\ 6 \end{bmatrix}. \]

Solving

\[ \begin{bmatrix} 3 & 3 \\ 3 & 5 \end{bmatrix} \hat{x} = \begin{bmatrix} 5 \\ 6 \end{bmatrix} \]

gives

\[ \hat{x} = \begin{bmatrix} 7/6 \\ 1/2 \end{bmatrix}. \]

8.4 Solution 4

The fitted vector is

\[ A\hat{x} = \begin{bmatrix} 7/6 \\ 5/3 \\ 13/6 \end{bmatrix}, \]

so

\[ r = \begin{bmatrix} -1/6 \\ 1/3 \\ -1/6 \end{bmatrix}. \]

Check against the first column (1,1,1):

\[ (-1/6) + (1/3) + (-1/6) = 0. \]

Check against the second column (0,1,2):

\[ 0(-1/6) + 1(1/3) + 2(-1/6) = 0. \]

Hence \(r\) is orthogonal to \(\operatorname{col}(A)\).

8.5 Solution 5

If the columns of \(A\) are orthonormal, then \(A^\top A = I\). The normal equations become

\[ \hat{x} = A^\top b. \]

So the coefficients are obtained by taking dot products with the orthonormal columns.

8.6 Solution 6

For any \(h\),

\[ \|A(\hat{x}+h)-b\|_2^2 = \|(-r)+Ah\|_2^2 = \|r\|_2^2 + \|Ah\|_2^2 - 2r^\top Ah. \]

If \(r \perp \operatorname{col}(A)\), then \(r^\top Ah = 0\), so

\[ \|A(\hat{x}+h)-b\|_2^2 = \|r\|_2^2 + \|Ah\|_2^2 \ge \|r\|_2^2. \]

Thus \(\hat{x}\) minimizes the objective.

8.7 Solution 7

For any nonzero \(v\),

\[ v^\top A^\top A v = (Av)^\top (Av) = \|Av\|_2^2. \]

If the columns of \(A\) are linearly independent, then \(Av \neq 0\) whenever \(v \neq 0\), so

\[ v^\top A^\top A v > 0. \]

Therefore \(A^\top A\) is positive definite.

8.8 Solution 8

If you perturb only the intercept, the residual sum changes immediately because the intercept column is the all-ones direction. If you perturb only the slope, the weighted residual sum changes immediately because the slope column is the feature direction \((0,1,2)^\top\).

In both cases, the SSE increases once you move away from the optimum. The lesson is not that one condition always fails “first” in every direction, but that each parameter perturbs the orthogonality condition attached to its own model direction.

8.9 Solution 9

On a small, well-conditioned problem, the two methods may agree closely. The important point is structural: QR works with orthogonal factors and avoids explicitly squaring the conditioning of the problem, while the normal equations build \(A^\top A\), which can magnify numerical error.

8.10 Solution 10

An intercept column is the all-ones vector. Since the residual is orthogonal to every column of the design matrix, it is orthogonal to the ones vector. Therefore

\[ \sum_i r_i = 0. \]

9 Common Errors

  • forgetting whether the projection target is a vector or a subspace
  • checking orthogonality against only one column instead of the whole column space
  • mixing up \(A^\top A\) and \(AA^\top\)
  • treating \((A^\top A)^{-1}A^\top b\) as valid without checking rank assumptions

10 What To Do Next

11 Sources and Further Reading

Back to top