Chain Rule and Linearization
chain rule, linearization, tangent plane, total differential, local linear map
1 Role
This page is the composition page of multivariable calculus.
Its job is to explain how local changes propagate through nested maps, and why the right multivariable local approximation is a linear map rather than just a scalar slope.
2 First-Pass Promise
Read this page after Partial Derivatives and Gradients.
If you stop here, you should still understand:
- why the multivariable chain rule sums contributions along dependency paths
- how local linearization generalizes tangent-line approximation
- why Jacobian-style thinking is already hiding inside ordinary chain-rule calculations
- why this page is the mathematical core behind backpropagation
3 Why It Matters
A multivariable model is almost never a single formula in one layer.
Usually it is built from pieces:
- inputs feed intermediate variables
- intermediate variables feed a loss or objective
- local changes flow through the whole composition
That is exactly what the chain rule describes.
The other key idea is linearization. Near a point, a differentiable multivariable function behaves like a linear map plus a small error. This is the several-variable upgrade of tangent-line approximation.
These two ideas are central because:
- optimization uses local linear and quadratic models
- ML uses backpropagation, which is repeated chain-rule bookkeeping
- engineering sensitivity analysis depends on how perturbations propagate through systems
4 Prerequisite Recall
- partial derivatives measure local change in coordinate directions
- the gradient packages first-order local information into one vector
- one-variable Taylor and linear approximation already taught that smooth functions look simpler at a small enough scale
5 Intuition
Suppose
\[ z = f(x,y), \qquad x = g(u,v), \qquad y = h(u,v). \]
Then \(z\) depends on \(u\) and \(v\) only through the intermediate variables \(x\) and \(y\).
If you nudge \(u\), that perturbation changes \(x\) and \(y\), and those changes then affect \(z\).
So the total change in \(z\) from changing \(u\) is the sum of:
effect of x on ztimeseffect of u on xeffect of y on ztimeseffect of u on y
That is the chain rule.
Linearization says that near a point, all of this complicated behavior is approximated by one linear map. So the chain rule is really the rule for composing local linear approximations.
6 Formal Core
Definition 1 (Chain Rule For Two Independent Variables) If \(f\) is differentiable at \((x,y)\) and \(g,h\) are differentiable at \((u,v)\), and
\[ z=f(x,y), \qquad x=g(u,v), \qquad y=h(u,v), \]
then \(z\) is a function of \((u,v)\), and
\[ \frac{\partial z}{\partial u} = \frac{\partial z}{\partial x}\frac{\partial x}{\partial u} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial u}, \]
\[ \frac{\partial z}{\partial v} = \frac{\partial z}{\partial x}\frac{\partial x}{\partial v} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial v}. \]
The rule says: follow each dependency path and add the resulting contributions.
Definition 2 (Linearization) If \(f(x,y)\) is differentiable at \((a,b)\), then near that point
\[ f(x,y)\approx f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(y-b). \]
This is the multivariable linearization of \(f\) at \((a,b)\).
It is the several-variable analog of the tangent-line approximation from one-variable calculus.
Proposition 1 (Tangent Plane View) For a surface \(z=f(x,y)\), the linearization can be viewed as the tangent plane:
\[ z \approx f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(y-b). \]
So first-order multivariable approximation is geometric as well as algebraic.
Proposition 2 (Chain Rule As Composition Of Local Linear Maps) At a first-pass level, the cleanest idea is:
- each differentiable map is locally linear
- composing the maps means composing those local linear approximations
- the chain rule is the coordinate formula for that composition
This is why the matrix form of the chain rule later becomes so natural.
7 Worked Example
Let
\[ z = f(x,y)=x^2+y^2, \qquad x=u+v, \qquad y=u-v. \]
We want \(\partial z/\partial u\) and the linearization of \(z(u,v)\) at \((u,v)=(1,0)\).
First compute the needed derivatives:
\[ \frac{\partial z}{\partial x}=2x, \qquad \frac{\partial z}{\partial y}=2y, \]
\[ \frac{\partial x}{\partial u}=1, \qquad \frac{\partial y}{\partial u}=1. \]
So by the chain rule,
\[ \frac{\partial z}{\partial u} = \frac{\partial z}{\partial x}\frac{\partial x}{\partial u} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial u} =2x+2y. \]
Substitute \(x=u+v\) and \(y=u-v\):
\[ \frac{\partial z}{\partial u}=2(u+v)+2(u-v)=4u. \]
Likewise,
\[ \frac{\partial z}{\partial v}=4v. \]
At \((u,v)=(1,0)\), the value of \(z\) is
\[ z=(1+0)^2+(1-0)^2=2. \]
The gradient in \((u,v)\) coordinates there is
\[ \nabla z(1,0)=(4,0). \]
So the linearization at \((1,0)\) is
\[ L(u,v)=2+4(u-1)+0(v-0)=4u-2. \]
This example shows the two main ideas together:
- the chain rule rewrites local sensitivity through intermediate variables
- linearization turns that first-order information into a usable local model
8 Computation Lens
A practical first-pass workflow for chain rule and linearization is:
- draw the dependency structure: which variables depend on which
- compute local derivatives one layer at a time
- multiply along dependency paths and add contributions
- after you have first-order data at a point, write the linearization
- interpret the approximation locally, not globally
This is the cleanest route from symbolic formulas to backprop-style thinking.
9 Application Lens
This page is one of the most important bridges on the whole site.
- in optimization, line search and local models depend on linearization
- in ML, backpropagation is repeated chain rule through a computation graph
- in sensitivity analysis, the question is exactly how perturbations propagate through composed maps
So if the previous page taught you what the gradient is, this page teaches you how gradients move through systems.
10 Stop Here For First Pass
If you can now explain:
- why the chain rule adds contributions from multiple dependency paths
- how to compute a simple multivariable chain-rule example
- what the linearization formula means geometrically
- why linearization is the multivariable tangent approximation
then this page has done its main job.
11 Go Deeper
The strongest next steps after this page are:
- Jacobians and Hessians, because the linear-map and second-order viewpoints become explicit there
- Optimization, to see local models become gradient-based algorithms and constrained reasoning
- Backpropagation and Computation Graphs, to see chain-rule bookkeeping in modern ML language
12 Optional Deeper Reading
- MIT 18.02SC Syllabus -
First pass- official MIT overview explicitly listing chain rule, total differentials, and linear approximation as core outcomes. Checked2026-04-25. - MIT 18.02SC Recitation: The Chain Rule with More Variables -
Second pass- compact official material emphasizing dependency-graph intuition. Checked2026-04-25. - OpenStax Calculus Volume 3: The Chain Rule -
Second pass- free text section on the generalized multivariable chain rule. Checked2026-04-25. - Paul’s Online Math Notes: Chain Rule -
Second pass- worked-example companion for multivariable chain-rule practice. Checked2026-04-25. - Paul’s Online Math Notes: Tangent Planes and Linear Approximations -
Second pass- practice-heavy bridge from formulas to local linear approximation. Checked2026-04-25.
13 Optional After First Pass
If you want more practice before moving on:
- draw a dependency graph for a nested function before differentiating
- compare a one-variable tangent line with a two-variable tangent plane
- compute a linearization and test how accurate it is near and far from the base point
14 Common Mistakes
- differentiating the outer function but forgetting how inner variables depend on the base variables
- multiplying along one path and forgetting other dependency paths
- treating linearization as globally accurate rather than local
- confusing gradient information in one coordinate system with gradient information after a change of variables
- writing a tangent plane without evaluating derivatives at the base point
15 Sources and Further Reading
- MIT 18.02SC Syllabus -
First pass- official MIT course outcomes showing where chain rule and linear approximation sit in the module. Checked2026-04-25. - MIT 18.02SC Recitation: The Chain Rule with More Variables -
Second pass- concise official chain-rule material with dependency-graph intuition. Checked2026-04-25. - OpenStax Calculus Volume 3: The Chain Rule -
Second pass- free text section on chain rule in several variables. Checked2026-04-25. - Paul’s Online Math Notes: Chain Rule -
Second pass- worked examples for multivariable chain-rule structure. Checked2026-04-25. - Paul’s Online Math Notes: Tangent Planes and Linear Approximations -
Second pass- worked examples for tangent planes and local linear models. Checked2026-04-25.