Iterative Methods and Preconditioning

Why large linear systems are often solved approximately rather than by full factorization, how stationary and Krylov methods use repeated matrix action, and how preconditioning reshapes a problem to accelerate convergence.

Modified

April 26, 2026

Keywords

iterative methods, preconditioning, Jacobi, Gauss-Seidel, conjugate gradient

1 Role

This is the third page of the Numerical Methods module.

Its job is to explain why, once systems become large or sparse, numerical linear algebra often stops trying to solve in one shot and instead improves an approximate answer step by step.

2 First-Pass Promise

Read this page after Numerical Linear Systems and Factorizations.

If you stop here, you should still understand:

why iterative methods matter for large sparse problems
how stationary iterations are built from a matrix splitting
why Krylov methods use matrix-vector products rather than full factorizations
why preconditioning is really about changing the effective conditioning or geometry of the problem

3 Why It Matters

Direct factorizations are excellent tools, but they are not always the right tools.

For very large systems, especially sparse ones, a full factorization may be:

too expensive in time
too expensive in memory
destructive to sparsity
more accurate than the application actually needs

In those settings, iterative methods become attractive because they:

use repeated matrix-vector products
improve an approximate solution gradually
can exploit sparsity and structure
can stop once the answer is good enough for the application

This is why iterative solvers appear everywhere in simulation, inverse problems, optimization, and scientific ML.

4 Prerequisite Recall

the previous page explained direct factorization methods like LU, QR, and Cholesky
a residual is \(r=b-A\hat x\)
conditioning and stability still matter even when the solve is iterative
symmetric positive definite systems often have especially clean geometry

5 Intuition

5.1 From One Big Solve To Many Cheap Steps

Iterative methods trade a single heavy computation for many lighter updates.

Instead of producing the solution immediately, they generate a sequence

\[ x^{(0)},x^{(1)},x^{(2)},\dots \]

that should move toward the true solution.

5.2 Stationary Iterations

The simplest idea is to split

\[ A=M-N \]

and rewrite

\[ Mx=Nx+b. \]

Then define the iteration

\[ x^{(k+1)} = M^{-1}Nx^{(k)} + M^{-1}b. \]

Jacobi and Gauss-Seidel are standard examples of this pattern.

5.3 Krylov Methods

More modern iterative methods build approximations from the space generated by

\[ r_0,\; Ar_0,\; A^2r_0,\dots \]

They use matrix action to explore increasingly useful directions without explicitly computing a full factorization.

That is why methods like conjugate gradient are central in large-scale numerical work.

5.4 Preconditioning

If the original system is hard, we try to solve a nearby system with better numerical geometry.

A preconditioner is an operator that is easier to solve with than A, but close enough to A to improve convergence when used inside the iteration.

The slogan is:

make the iterations see a better-conditioned problem

6 Formal Core

Definition 1 (Definition: Stationary Iteration) Given a splitting

\[ A=M-N, \]

the stationary iteration is

\[ x^{(k+1)} = M^{-1}Nx^{(k)} + M^{-1}b. \]

The matrix

\[ T=M^{-1}N \]

is the iteration matrix.

Theorem 1 (Theorem Idea: Convergence Through The Iteration Matrix) If the spectral radius satisfies

\[ \rho(T)<1, \]

then the stationary iteration converges.

This makes convergence a spectral question, not just a coding question.

Definition 2 (Definition: Krylov Subspace) Given a matrix A and a starting vector r_0, the mth Krylov subspace is

\[ \mathcal K_m(A,r_0)=\operatorname{span}\{r_0,Ar_0,A^2r_0,\dots,A^{m-1}r_0\}. \]

Krylov methods search for good approximations inside this growing space.

Definition 3 (Definition: Preconditioner) A preconditioner is an auxiliary operator P or M that is easier to solve with than A and is used to transform the system into one with better convergence behavior.

Common transformed forms include

\[ M^{-1}Ax=M^{-1}b \]

or, in symmetric settings, a geometry-preserving equivalent problem built from M^{-1/2}AM^{-1/2}.

Theorem 2 (Theorem Idea: Preconditioning Changes The Effective Geometry) Good preconditioning does not solve the original problem exactly in one step.

Instead, it changes the spectral spread or geometry seen by the iteration so that convergence is faster and more uniform.

7 A Small Worked Example

Consider the system

\[ A= \begin{bmatrix} 4 & 1 \\ 1 & 3 \end{bmatrix}, \qquad b= \begin{bmatrix} 1 \\ 2 \end{bmatrix}. \]

For Jacobi iteration, take

\[ M=D= \begin{bmatrix} 4 & 0 \\ 0 & 3 \end{bmatrix}, \qquad N= \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix}. \]

Then

\[ x^{(k+1)} = D^{-1}(b-(L+U)x^{(k)}) \]

and the iteration matrix is

\[ T=D^{-1}N= \begin{bmatrix} 0 & -1/4 \\ -1/3 & 0 \end{bmatrix}. \]

Its eigenvalues are

\[ \pm \sqrt{1/12}, \]

\[ \rho(T)=\sqrt{1/12}<1. \]

That tells us the iteration converges.

Starting from

\[ x^{(0)}= \begin{bmatrix} 0\\ 0 \end{bmatrix}, \]

the first two iterates are

\[ x^{(1)}= \begin{bmatrix} 1/4\\ 2/3 \end{bmatrix}, \qquad x^{(2)}= \begin{bmatrix} 1/12\\ 7/12 \end{bmatrix}. \]

The point of the example is not that Jacobi is always best. It is that convergence becomes a spectral property of the iteration, and better geometry usually means fewer steps.

8 Computation Lens

When you face a large linear system, ask:

is a direct factorization affordable in memory and time?
is the matrix sparse or structured enough that matrix-vector products are cheap?
do I need a very accurate answer, or only one good enough for the outer algorithm?
is the bottleneck really the method, or is the real issue poor conditioning that calls for preconditioning?

Those questions often decide whether direct or iterative computation is the right language.

9 Application Lens

9.1 Scientific Computing And PDEs

Large discretized systems often cannot afford dense direct methods, so iterative solvers and preconditioners become part of the model pipeline itself.

9.2 Optimization

Conjugate-gradient ideas, preconditioned linear solves, and Hessian-free methods appear naturally inside large optimization routines.

9.3 Statistics And ML

Kernel methods, large least-squares problems, and covariance-based computations often rely on iterative linear algebra when matrix size blocks direct approaches.

10 Stop Here For First Pass

If you can now explain:

why iterative methods become attractive for large sparse systems
why convergence of stationary methods is tied to the iteration matrix
what makes Krylov methods different from simple stationary updates
why preconditioning changes convergence by changing the effective problem geometry

then this page has done its job.

11 Go Deeper

After this page, the next natural step is:

Numerical Least Squares and Regularization

The strongest adjacent pages are:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

MIT 18.335J week 8 - official MIT OCW week page covering the course’s preconditioning block. Checked 2026-04-25.
MIT 18.086: Iterative Methods and Preconditioners - official MIT lecture resource focused directly on iterative methods and preconditioners. Checked 2026-04-25.
Cornell CS4220 syllabus - official current syllabus showing the course shift from factorization methods to stationary and Krylov iterations. Checked 2026-04-25.
Cornell CS4220 schedule - official current schedule showing the dedicated Krylov-iteration week. Checked 2026-04-25.
Cornell CS4220: Gauss-Seidel convergence - official current notes illustrating stationary-iteration convergence and the SPD energy viewpoint. Checked 2026-04-25.
Stanford CS137 syllabus - official current syllabus showing the standard path from Jacobi and Gauss-Seidel to conjugate gradient. Checked 2026-04-25.

13 Sources and Further Reading

MIT 18.335J week 8 - First pass - official MIT course page for the transition into preconditioning. Checked 2026-04-25.
MIT 18.086: Iterative Methods and Preconditioners - First pass - official lecture resource with a direct iterative-methods viewpoint. Checked 2026-04-25.
Cornell CS4220 syllabus - First pass - official current syllabus showing stationary and Krylov iterations as a central course block. Checked 2026-04-25.
Cornell CS4220 schedule - Second pass - official current schedule that helps place iterative and Krylov methods in context. Checked 2026-04-25.
Cornell CS4220: Gauss-Seidel convergence - Second pass - official current notes for the stationary-iteration convergence viewpoint. Checked 2026-04-25.
Stanford CS137 syllabus - Second pass - official current syllabus connecting Jacobi, Gauss-Seidel, and conjugate gradient in a scientific-computing sequence. Checked 2026-04-25.