Jacobians and Hessians

How first-order local behavior becomes a matrix through the Jacobian, how second-order curvature becomes a matrix through the Hessian, and why both matter for optimization and approximation.
Modified

April 26, 2026

Keywords

Jacobian, Hessian, linearization, curvature, multivariable calculus

1 Role

This page is the matrix-language page of multivariable calculus.

Its job is to make the local linear-map and local curvature viewpoints explicit, so that first-order and second-order reasoning stop looking like disconnected formulas.

2 First-Pass Promise

Read this page after Chain Rule and Linearization.

If you stop here, you should still understand:

  • why the Jacobian is the matrix form of first-order local behavior
  • why the Hessian packages second derivatives into one curvature object
  • how Jacobian and Hessian viewpoints organize optimization and approximation
  • why these matrices matter more than any one partial derivative on its own

3 Why It Matters

Once you accept that a differentiable multivariable function is locally a linear map, the natural question is:

what matrix represents that map?

That answer is the Jacobian.

Then the next question is:

how does that first-order linear model itself change from point to point?

That is where second derivatives and the Hessian enter.

These two matrices matter because:

  • Jacobians control local sensitivity, coordinate changes, and multistage derivatives
  • Hessians control curvature, local quadratic models, and second-order optimization intuition
  • together they make multivariable calculus look like a clean conversation between calculus and linear algebra

4 Prerequisite Recall

  • linearization says a differentiable map is locally linear
  • the chain rule says these local linear maps compose
  • gradients capture first-order information for scalar-valued functions

5 Intuition

For a map

\[ F:\mathbb{R}^n \to \mathbb{R}^m, \]

the Jacobian is the matrix that best describes the local input-output transformation near a point.

So if you make a small change \(\Delta x\), the output changes approximately by

\[ F(x+\Delta x)\approx F(x)+J_F(x)\,\Delta x. \]

That is the multivariable analog of the derivative formula in one variable.

For a scalar-valued function \(f\), the Hessian collects all second partial derivatives:

\[ H_f(x)= \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}. \]

This matrix tells you how the gradient itself changes, which is why it captures curvature.

6 Formal Core

Definition 1 (Jacobian Matrix) If

\[ F(x_1,\dots,x_n)= \begin{bmatrix} f_1(x_1,\dots,x_n) \\ \vdots \\ f_m(x_1,\dots,x_n) \end{bmatrix}, \]

then the Jacobian matrix of \(F\) is

\[ J_F(x)= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}. \]

It is the matrix of first partial derivatives for the vector-valued map.

Proposition 1 (Jacobian As Linearization Matrix) At a differentiable point \(a\),

\[ F(a+h)\approx F(a)+J_F(a)\,h. \]

So the Jacobian is the matrix representation of the best first-order local linear map.

Definition 2 (Hessian Matrix) For a scalar-valued function \(f:\mathbb{R}^n\to\mathbb{R}\), the Hessian is the matrix of second partial derivatives:

\[ H_f(x)= \left[ \frac{\partial^2 f}{\partial x_i\partial x_j} \right]_{i,j=1}^n. \]

When mixed partial derivatives agree, the Hessian is symmetric.

Proposition 2 (Hessian And Curvature) The Hessian describes how the gradient changes and therefore how the local first-order model bends.

That is why second-order optimization, local maxima/minima tests, and quadratic approximation all depend on the Hessian.

7 Worked Example

Consider the map

\[ F(x,y)= \begin{bmatrix} x^2+y \\ xy \end{bmatrix}. \]

Its Jacobian is

\[ J_F(x,y)= \begin{bmatrix} 2x & 1 \\ y & x \end{bmatrix}. \]

At the point \((1,2)\), this becomes

\[ J_F(1,2)= \begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix}. \]

This matrix is the local linear map sending small input perturbations \((\Delta x,\Delta y)\) to approximate output perturbations.

Now consider the scalar function

\[ f(x,y)=x^2+3xy+y^2. \]

Its gradient is

\[ \nabla f(x,y)= \begin{bmatrix} 2x+3y \\ 3x+2y \end{bmatrix}. \]

Its Hessian is

\[ H_f(x,y)= \begin{bmatrix} 2 & 3 \\ 3 & 2 \end{bmatrix}. \]

Notice that the Hessian is constant here. That means the quadratic curvature structure of this function is the same everywhere.

This is exactly why quadratic models are so important in optimization: their Hessians capture the whole second-order story in one matrix.

8 Computation Lens

A practical first-pass workflow is:

  1. if the function is vector-valued, think Jacobian
  2. if the function is scalar-valued and you care about curvature, think Hessian
  3. compute partial derivatives row by row for a Jacobian
  4. compute second partial derivatives entry by entry for a Hessian
  5. interpret the matrix, not just the individual entries

That last point matters most. The real object is the map or curvature encoded by the matrix.

9 Application Lens

These matrices show up all over the later site:

  • Jacobians appear in backpropagation, sensitivity analysis, and change of variables
  • Hessians appear in second-order optimization, curvature reasoning, and local minima tests
  • optimization papers often talk about conditioning, curvature, or smoothness in language that is really Hessian language under the hood

So this page is where multivariable calculus starts to feel fully compatible with linear algebra and optimization.

10 Stop Here For First Pass

If you can now explain:

  • why the Jacobian is the matrix of first-order local behavior
  • why the Hessian packages second-order information
  • why Jacobian matters for composed maps
  • why Hessian matters for curvature and optimization

then this page has done its main job.

11 Go Deeper

The strongest next steps after this page are:

  1. Multiple Integrals, because the accumulation branch of multivariable calculus is now ready too
  2. Constrained Optimization, because local gradient and Hessian information become actionable under constraints
  3. Optimization, to see first-order and second-order models become algorithms and certificates

12 Optional Deeper Reading

13 Optional After First Pass

If you want more practice before moving on:

  • write the Jacobian of a two-output function and evaluate it at a point
  • write the Hessian of a quadratic and interpret what its entries say
  • compare a gradient vector with the Jacobian of a scalar-valued map

14 Common Mistakes

  • using the gradient where a full Jacobian is needed
  • treating the Jacobian as only a table of derivatives instead of a linear map
  • forgetting that Hessian is only for scalar-valued functions in this common setup
  • reading Hessian entries without asking what they say about the whole quadratic form
  • mixing coordinate formulas with geometric meaning and losing both

15 Sources and Further Reading

Back to top