Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition

How Bellman reasoning turns continuous-time stochastic control into a value-function PDE with drift, diffusion, and optimization terms.

Modified

April 26, 2026

Keywords

stochastic control, HJB, Hamilton-Jacobi-Bellman, controlled diffusion, value function

1 Role

This is the sixth page of the Stochastic Control and Dynamic Programming module.

Its job is to show what Bellman reasoning becomes in continuous time.

In discrete time, dynamic programming gave:

Bellman recursions
value iteration
policy iteration

In continuous time, the same logic turns into a partial differential equation for the value function.

That equation is the Hamilton-Jacobi-Bellman equation.

2 First-Pass Promise

Read this page after Stochastic Linear Systems, LQG, and the Separation Principle.

If you stop here, you should still understand:

what a continuous-time stochastic control model looks like
why the value function now depends on time and state continuously
why Bellman reasoning becomes an HJB equation
what the drift term, diffusion term, and minimization mean

3 Why It Matters

Discrete-time MDPs are mathematically clean, but many systems are naturally continuous:

motion
diffusion
finance-style uncertainty
controlled physical processes
continuous-time generative and transport models

In those settings, it is awkward to think only in terms of time-step-by-time-step tables.

Instead, we ask for a value function V(t,x) that tells us:

if the current time is t and the state is x, what is the best achievable expected future cost?

The dynamic-programming principle still applies.

But because time is continuous, the one-step Bellman update is replaced by a local-in-time expansion.

That is why the outcome is not a recursion but a PDE.

4 Prerequisite Recall

Bellman equations summarize optimal future cost through value functions
in stochastic linear systems, noise enters through random disturbances and may still preserve strong structure
ODEs and dynamical systems already gave the language of flow, drift, and continuous-time evolution
real analysis matters here because the value function is treated through limits, derivatives, and PDE-style reasoning

5 Intuition

5.1 Continuous Time Means Infinitesimal Bellman Updates

In discrete time, Bellman compares:

immediate cost over one step
plus future value at the next state

In continuous time, the time step becomes very small.

So Bellman reasoning becomes:

immediate cost over an interval of length dt
plus future value after an infinitesimal state change

That is why derivatives appear.

5.2 Drift Tells Us The Deterministic Local Motion

If the state obeys a controlled diffusion

\[ dX_t = f(X_t,u_t,t)\,dt + \sigma(X_t,u_t,t)\,dW_t, \]

then f is the local drift.

It is the deterministic directional part of the motion.

5.3 Diffusion Adds A Second-Order Effect

Noise does not only perturb trajectories.

At the value-function level, it creates a second-order term.

That is why the HJB equation contains not only first derivatives like \nabla V, but also second derivatives through the Hessian.

5.4 HJB Is Bellman In PDE Form

At first pass, the HJB equation is simply:

Bellman equation + continuous time + stochastic local expansion

So the equation packages:

running cost
best control choice
deterministic drift effect
diffusion effect

all in one place.

6 Formal Core

Definition 1 (Definition: Controlled Diffusion) At a first pass, a continuous-time stochastic control model can be written as

\[ dX_t = f(X_t,u_t,t)\,dt + \sigma(X_t,u_t,t)\,dW_t, \]

where:

X_t is the state
u_t is the control
W_t is Brownian motion

This is the continuous-time stochastic analog of a controlled state-space system.

Definition 2 (Definition: Continuous-Time Value Function) For horizon T, define

\[ V(t,x)=\inf_u \mathbb{E}\!\left[\int_t^T \ell(X_s,u_s,s)\,ds + g(X_T)\,\middle|\, X_t=x\right]. \]

This value function records the best expected future cost starting from time t and state x.

Theorem 1 (Theorem Idea: Dynamic Programming Principle) For a very short time increment h>0, the value at (t,x) equals:

the best expected cost accumulated over [t,t+h]
plus the optimal continuation value starting from the random state at time t+h

This is the continuous-time form of the same principle of optimality used earlier in the module.

Theorem 2 (Theorem Idea: Hamilton-Jacobi-Bellman Equation) Under smoothness assumptions, the value function satisfies

\[ -\partial_t V(t,x) = \inf_u \left\{ \ell(x,u,t) \nabla V(t,x)\cdot f(x,u,t) \frac{1}{2}\operatorname{tr}\!\left(\sigma(x,u,t)\sigma(x,u,t)^T \nabla^2 V(t,x)\right) \right\}, \]

with terminal condition

\[ V(T,x)=g(x). \]

At first pass, read the terms as:

\ell: running cost
\nabla V \cdot f: how deterministic drift changes future value
\frac12 \operatorname{tr}(\sigma \sigma^T \nabla^2 V): how diffusion changes future value
\inf_u: choose the best local control action

Theorem 3 (Theorem Idea: HJB Is The Continuous-Time Bellman Equation) The HJB equation plays the same conceptual role in continuous time that the Bellman equation plays in discrete time:

it characterizes the optimal value function
and a control that attains the minimization is the candidate optimal feedback law

7 Worked Example

Consider the scalar controlled diffusion

\[ dX_t = u_t\,dt + \sigma\,dW_t, \]

with running cost

\[ \ell(x,u)=\frac12 u^2 \]

and terminal cost

\[ g(x)=\frac12 qx^2. \]

Then the HJB equation has the first-pass form

$$ -V_t(t,x)

= u { u^2 + V_x(t,x),u + V{xx}(t,x) }. $$

The point of the example is not to solve the PDE fully.

It is to see the three ingredients clearly:

\frac12 u^2: control effort cost
V_x u: drift-controlled first-order effect
\frac{\sigma^2}{2}V_{xx}: noise-induced second-order effect

If you minimize the bracket with respect to u, you get the candidate rule

\[ u^\ast = -V_x(t,x). \]

That is the continuous-time analog of “act greedily with respect to the value function.”

8 Computation Lens

When you meet an HJB-style statement, ask:

what are the state, control, drift, and diffusion?
what is the running cost and what is the terminal cost?
is the value function finite-horizon V(t,x) or stationary V(x)?
where is the first-order drift term?
where is the second-order diffusion term?

Those questions usually make the PDE readable even if the full analysis is hard.

9 Application Lens

9.1 Continuous-Time Optimal Control

HJB is the natural value-function language for continuous-time decision-making under uncertainty.

9.2 Finance, Diffusions, And Sequential Optimization

Many stochastic optimization problems on diffusions are best understood through controlled SDEs and HJB equations.

9.3 Bridge To Modern ML

Continuous-time generative models, transport dynamics, and some control-flavored learning problems reuse the same vocabulary of drift, diffusion, and value-based reasoning.

10 Stop Here For First Pass

If you stop here, retain these five ideas:

a continuous-time stochastic control problem is naturally modeled by a controlled diffusion
the value function becomes a function of continuous time and state
the dynamic-programming principle still holds, but locally in time
the HJB equation is the continuous-time Bellman equation
noise produces the second-order diffusion term in the value-function PDE

11 Go Deeper

The strongest next page is:

Partial Observability, Belief States, and RL/Control Bridges

The strongest adjacent live pages are:

12 Optional Deeper Reading After First Pass

MIT 16.323 lecture notes index - official lecture-note index for optimal control and HJB-style derivations. Checked 2026-04-25.
MIT 16.323 lecture 4 - official notes page for the Hamilton-Jacobi-Bellman equation. Checked 2026-04-25.
Stanford EE365: Stochastic Control - official course page for Bellman-style sequential decision-making and stochastic control. Checked 2026-04-25.
Stanford EE365 lecture slides - official slide index for the course. Checked 2026-04-25.
Stanford EE365 linear quadratic stochastic control notes - official lecture notes connecting stochastic control structure to value-function reasoning. Checked 2026-04-25.

13 Sources and Further Reading

MIT 16.323 lecture notes index - First pass - official lecture-note index for optimal control and dynamic-programming derivations. Checked 2026-04-25.
MIT 16.323 lecture 4 - First pass - official notes page for the Hamilton-Jacobi-Bellman equation. Checked 2026-04-25.
Stanford EE365: Stochastic Control - First pass - official course page for stochastic control and Bellman-style methods. Checked 2026-04-25.
Stanford EE365 lecture slides - First pass - official slide index for the course. Checked 2026-04-25.
Stanford EE365 linear quadratic stochastic control notes - Second pass - official lecture notes connecting stochastic control structure back to value-function ideas. Checked 2026-04-25.