Iterative Methods and Preconditioning
iterative methods, preconditioning, Jacobi, Gauss-Seidel, conjugate gradient
1 Role
This is the third page of the Numerical Methods module.
Its job is to explain why, once systems become large or sparse, numerical linear algebra often stops trying to solve in one shot and instead improves an approximate answer step by step.
2 First-Pass Promise
Read this page after Numerical Linear Systems and Factorizations.
If you stop here, you should still understand:
- why iterative methods matter for large sparse problems
- how stationary iterations are built from a matrix splitting
- why Krylov methods use matrix-vector products rather than full factorizations
- why preconditioning is really about changing the effective conditioning or geometry of the problem
3 Why It Matters
Direct factorizations are excellent tools, but they are not always the right tools.
For very large systems, especially sparse ones, a full factorization may be:
- too expensive in time
- too expensive in memory
- destructive to sparsity
- more accurate than the application actually needs
In those settings, iterative methods become attractive because they:
- use repeated matrix-vector products
- improve an approximate solution gradually
- can exploit sparsity and structure
- can stop once the answer is good enough for the application
This is why iterative solvers appear everywhere in simulation, inverse problems, optimization, and scientific ML.
4 Prerequisite Recall
- the previous page explained direct factorization methods like LU, QR, and Cholesky
- a residual is \(r=b-A\hat x\)
- conditioning and stability still matter even when the solve is iterative
- symmetric positive definite systems often have especially clean geometry
5 Intuition
5.1 From One Big Solve To Many Cheap Steps
Iterative methods trade a single heavy computation for many lighter updates.
Instead of producing the solution immediately, they generate a sequence
\[ x^{(0)},x^{(1)},x^{(2)},\dots \]
that should move toward the true solution.
5.2 Stationary Iterations
The simplest idea is to split
\[ A=M-N \]
and rewrite
\[ Mx=Nx+b. \]
Then define the iteration
\[ x^{(k+1)} = M^{-1}Nx^{(k)} + M^{-1}b. \]
Jacobi and Gauss-Seidel are standard examples of this pattern.
5.3 Krylov Methods
More modern iterative methods build approximations from the space generated by
\[ r_0,\; Ar_0,\; A^2r_0,\dots \]
They use matrix action to explore increasingly useful directions without explicitly computing a full factorization.
That is why methods like conjugate gradient are central in large-scale numerical work.
5.4 Preconditioning
If the original system is hard, we try to solve a nearby system with better numerical geometry.
A preconditioner is an operator that is easier to solve with than A, but close enough to A to improve convergence when used inside the iteration.
The slogan is:
make the iterations see a better-conditioned problem
6 Formal Core
Definition 1 (Definition: Stationary Iteration) Given a splitting
\[ A=M-N, \]
the stationary iteration is
\[ x^{(k+1)} = M^{-1}Nx^{(k)} + M^{-1}b. \]
The matrix
\[ T=M^{-1}N \]
is the iteration matrix.
Theorem 1 (Theorem Idea: Convergence Through The Iteration Matrix) If the spectral radius satisfies
\[ \rho(T)<1, \]
then the stationary iteration converges.
This makes convergence a spectral question, not just a coding question.
Definition 2 (Definition: Krylov Subspace) Given a matrix A and a starting vector r_0, the mth Krylov subspace is
\[ \mathcal K_m(A,r_0)=\operatorname{span}\{r_0,Ar_0,A^2r_0,\dots,A^{m-1}r_0\}. \]
Krylov methods search for good approximations inside this growing space.
Definition 3 (Definition: Preconditioner) A preconditioner is an auxiliary operator P or M that is easier to solve with than A and is used to transform the system into one with better convergence behavior.
Common transformed forms include
\[ M^{-1}Ax=M^{-1}b \]
or, in symmetric settings, a geometry-preserving equivalent problem built from M^{-1/2}AM^{-1/2}.
Theorem 2 (Theorem Idea: Preconditioning Changes The Effective Geometry) Good preconditioning does not solve the original problem exactly in one step.
Instead, it changes the spectral spread or geometry seen by the iteration so that convergence is faster and more uniform.
7 A Small Worked Example
Consider the system
\[ A= \begin{bmatrix} 4 & 1 \\ 1 & 3 \end{bmatrix}, \qquad b= \begin{bmatrix} 1 \\ 2 \end{bmatrix}. \]
For Jacobi iteration, take
\[ M=D= \begin{bmatrix} 4 & 0 \\ 0 & 3 \end{bmatrix}, \qquad N= \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix}. \]
Then
\[ x^{(k+1)} = D^{-1}(b-(L+U)x^{(k)}) \]
and the iteration matrix is
\[ T=D^{-1}N= \begin{bmatrix} 0 & -1/4 \\ -1/3 & 0 \end{bmatrix}. \]
Its eigenvalues are
\[ \pm \sqrt{1/12}, \]
so
\[ \rho(T)=\sqrt{1/12}<1. \]
That tells us the iteration converges.
Starting from
\[ x^{(0)}= \begin{bmatrix} 0\\ 0 \end{bmatrix}, \]
the first two iterates are
\[ x^{(1)}= \begin{bmatrix} 1/4\\ 2/3 \end{bmatrix}, \qquad x^{(2)}= \begin{bmatrix} 1/12\\ 7/12 \end{bmatrix}. \]
The point of the example is not that Jacobi is always best. It is that convergence becomes a spectral property of the iteration, and better geometry usually means fewer steps.
8 Computation Lens
When you face a large linear system, ask:
- is a direct factorization affordable in memory and time?
- is the matrix sparse or structured enough that matrix-vector products are cheap?
- do I need a very accurate answer, or only one good enough for the outer algorithm?
- is the bottleneck really the method, or is the real issue poor conditioning that calls for preconditioning?
Those questions often decide whether direct or iterative computation is the right language.
9 Application Lens
9.1 Scientific Computing And PDEs
Large discretized systems often cannot afford dense direct methods, so iterative solvers and preconditioners become part of the model pipeline itself.
9.2 Optimization
Conjugate-gradient ideas, preconditioned linear solves, and Hessian-free methods appear naturally inside large optimization routines.
9.3 Statistics And ML
Kernel methods, large least-squares problems, and covariance-based computations often rely on iterative linear algebra when matrix size blocks direct approaches.
10 Stop Here For First Pass
If you can now explain:
- why iterative methods become attractive for large sparse systems
- why convergence of stationary methods is tied to the iteration matrix
- what makes Krylov methods different from simple stationary updates
- why preconditioning changes convergence by changing the effective problem geometry
then this page has done its job.
11 Go Deeper
After this page, the next natural step is:
The strongest adjacent pages are:
12 Optional Deeper Reading After First Pass
The strongest current references connected to this page are:
- MIT 18.335J week 8 - official MIT OCW week page covering the course’s preconditioning block. Checked
2026-04-25. - MIT 18.086: Iterative Methods and Preconditioners - official MIT lecture resource focused directly on iterative methods and preconditioners. Checked
2026-04-25. - Cornell CS4220 syllabus - official current syllabus showing the course shift from factorization methods to stationary and Krylov iterations. Checked
2026-04-25. - Cornell CS4220 schedule - official current schedule showing the dedicated Krylov-iteration week. Checked
2026-04-25. - Cornell CS4220: Gauss-Seidel convergence - official current notes illustrating stationary-iteration convergence and the SPD energy viewpoint. Checked
2026-04-25. - Stanford CS137 syllabus - official current syllabus showing the standard path from Jacobi and Gauss-Seidel to conjugate gradient. Checked
2026-04-25.
13 Sources and Further Reading
- MIT 18.335J week 8 -
First pass- official MIT course page for the transition into preconditioning. Checked2026-04-25. - MIT 18.086: Iterative Methods and Preconditioners -
First pass- official lecture resource with a direct iterative-methods viewpoint. Checked2026-04-25. - Cornell CS4220 syllabus -
First pass- official current syllabus showing stationary and Krylov iterations as a central course block. Checked2026-04-25. - Cornell CS4220 schedule -
Second pass- official current schedule that helps place iterative and Krylov methods in context. Checked2026-04-25. - Cornell CS4220: Gauss-Seidel convergence -
Second pass- official current notes for the stationary-iteration convergence viewpoint. Checked2026-04-25. - Stanford CS137 syllabus -
Second pass- official current syllabus connecting Jacobi, Gauss-Seidel, and conjugate gradient in a scientific-computing sequence. Checked2026-04-25.