Partial Derivatives and Gradients

How to measure local change of a function of several variables, compute coordinate-wise rates, and interpret the gradient as the direction of steepest local increase.
Modified

April 26, 2026

Keywords

partial derivative, gradient, directional change, level sets, multivariable calculus

1 Role

This page is the entry point to multivariable calculus.

Its job is to extend the derivative idea from one-variable local change to many-variable local change, then package those rates together into the gradient.

2 First-Pass Promise

Read this page first in the multivariable calculus module.

If you stop here, you should still understand:

  • what a partial derivative means conceptually
  • how the gradient bundles partial derivatives into one vector
  • why the gradient points normal to level sets
  • why the gradient gives the direction of steepest local increase

3 Why It Matters

Once a function depends on several variables, there is no longer just one way to move.

That changes calculus completely.

Instead of one derivative, you now need to ask:

  • what happens if I vary only \(x\)
  • what happens if I vary only \(y\)
  • what happens if I move in some combined direction
  • what linear object best describes local change in all directions at once

The answers begin with partial derivatives and the gradient.

This matters directly for:

  • optimization, where gradients drive updates
  • ML, where losses depend on many parameters
  • geometry, where gradients describe level sets and normal directions
  • engineering, where sensitivity depends on many inputs simultaneously

4 Prerequisite Recall

  • a one-variable derivative measures instantaneous local change
  • linear approximation gives the best local line in one variable
  • vectors from linear algebra package magnitude and direction together

5 Intuition

For a function \(f(x,y)\), you can freeze \(y\) and watch how \(f\) changes as \(x\) moves. That gives the partial derivative with respect to \(x\).

You can also freeze \(x\) and watch how \(f\) changes as \(y\) moves. That gives the partial derivative with respect to \(y\).

So partial derivatives are still ordinary derivatives in spirit. The only new idea is that all other variables are temporarily held fixed.

The gradient then collects these coordinate-wise rates:

\[ \nabla f(x,y)=\left(f_x(x,y),f_y(x,y)\right). \]

This vector tells you:

  • which way the function increases most rapidly
  • how strongly it increases in that direction
  • which directions are tangent to a level curve or level surface

So the gradient is the first truly multivariable local model object.

6 Formal Core

Definition 1 (Partial Derivative) For a function \(f(x,y)\), the partial derivative with respect to \(x\) is

\[ \frac{\partial f}{\partial x}(x,y) = \lim_{h\to 0}\frac{f(x+h,y)-f(x,y)}{h}, \]

when the limit exists.

Likewise,

\[ \frac{\partial f}{\partial y}(x,y) = \lim_{h\to 0}\frac{f(x,y+h)-f(x,y)}{h}. \]

Each partial derivative measures local change along one coordinate direction while the others are held fixed.

Definition 2 (Gradient) For a scalar-valued function \(f\) of several variables, the gradient is the vector of partial derivatives:

\[ \nabla f(x_1,\dots,x_n)= \left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right). \]

This vector is the multivariable analog of the derivative.

Proposition 1 (Gradient And Level Sets) If \(f\) is differentiable at a point, \(\nabla f \ne 0\) there, and the level set is smooth near that point, then the gradient is perpendicular to the level set

\[ f(x_1,\dots,x_n)=c \]

through that point.

So the gradient points normal to the contour or level surface.

Proposition 2 (Steepest Increase) If \(f\) is differentiable at a point and \(\nabla f \ne 0\) there, then the gradient points in the direction of steepest local increase of \(f\).

Its magnitude gives the maximal directional rate of increase.

7 Worked Example

Let

\[ f(x,y)=x^2+3xy+y^2. \]

Compute its partial derivatives.

Holding \(y\) fixed:

\[ f_x(x,y)=2x+3y. \]

Holding \(x\) fixed:

\[ f_y(x,y)=3x+2y. \]

So the gradient is

\[ \nabla f(x,y)=(2x+3y,\;3x+2y). \]

Evaluate at the point \((1,-1)\):

\[ \nabla f(1,-1)=(-1,1). \]

What does this mean?

  • near \((1,-1)\), the function increases fastest in the direction \((-1,1)\)
  • directions tangent to the level curve there are perpendicular to \((-1,1)\)
  • a gradient-based optimization method would use this local vector information directly

This is the exact point where single-variable “slope” becomes multivariable “best local direction.”

8 Computation Lens

A practical first-pass workflow for gradient problems is:

  1. identify the independent variables
  2. compute each partial derivative by holding the others fixed
  3. assemble the gradient vector from those partials
  4. evaluate the gradient at the point of interest
  5. interpret the result geometrically: steepest increase, sensitivity, and level-set normal direction

This turns symbolic differentiation into usable multivariable intuition.

9 Application Lens

In optimization and ML, the gradient is everywhere:

  • gradient descent steps against it
  • backpropagation computes it through a chain of local derivatives
  • saliency and sensitivity ideas often start from it
  • stationarity conditions are often phrased using it

So if one-variable calculus teaches you what slope means, this page teaches you what gradient means before you ever see the full chain rule or Hessian machinery.

10 Stop Here For First Pass

If you can now explain:

  • how a partial derivative differs from an ordinary derivative
  • how to build the gradient from partial derivatives
  • why the gradient is normal to a level set
  • why the gradient points in the direction of steepest local increase

then this page has done its main job.

11 Go Deeper

The strongest next steps after this page are:

  1. Chain Rule and Linearization, because gradients become truly powerful once local effects compose
  2. Optimization, to see gradients turn into algorithms and stationarity conditions
  3. Backpropagation and Computation Graphs, to see multivariable derivative flow in modern ML language

12 Optional Deeper Reading

13 Optional After First Pass

If you want more practice before moving on:

  • compute a gradient for a quadratic function and evaluate it at several points
  • sketch a few level curves and decide which direction the gradient should point
  • compare the one-variable derivative of a slice with the full multivariable gradient at the same point

14 Common Mistakes

  • forgetting to hold the other variables fixed during a partial derivative
  • treating the gradient like a scalar instead of a vector
  • mixing up “direction of steepest increase” with “direction of a level curve”
  • assuming partial derivatives alone automatically give every local fact without interpretation
  • using gradient language before specifying the point where it is evaluated

15 Sources and Further Reading

Back to top