Stochastic Control and Dynamic Programming

How controlled Markov models, Bellman equations, dynamic programming, and stochastic optimal control connect sequential decision-making to control, information, and reinforcement learning.

Modified

April 26, 2026

Keywords

stochastic control, dynamic programming, Markov decision process, Bellman equation, reinforcement learning

1 Why This Module Matters

Control and dynamics already gave the site:

state-space models
feedback
estimation
optimal control for structured linear systems

But many sequential decision problems are noisier and more abstract than that.

They ask:

what if the next state is random?
what if decisions are made repeatedly under uncertainty?
what if value must be computed recursively over time?
what if the language of control and the language of RL are really describing the same underlying objects?

This module is where those questions get one clean answer.

It is the bridge from deterministic or structured control into Markov decision processes, Bellman equations, stochastic optimal control, and RL-facing sequential decision-making.

Prerequisites Control and Dynamics should come first. Probability matters because randomness lives in transition laws and observations. Optimization helps because policies, value functions, and cost-to-go objects are optimization objects.

Unlocks Dynamic programming, MDPs, value iteration, policy iteration, stochastic optimal control, POMDP intuition, RL bridges

Research Use Reading papers or courses on stochastic control, sequential decision-making, planning under uncertainty, approximate dynamic programming, and reinforcement learning

2 First Pass Through This Module

The intended first-pass spine for this module is:

The module has a full seven-page first-pass spine. Together these pages introduce:

the MDP viewpoint itself: state, action, transition law, policy, reward or cost
the principle of optimality
the Bellman recursion
backward induction as the first dynamic-programming algorithm
infinite-horizon value functions
Bellman fixed points and contraction intuition
value iteration and policy iteration as the first exact algorithmic layer
approximate dynamic programming as the bridge to larger-scale problems
the structured linear-Gaussian route where dynamic programming closes into Riccati and Kalman objects
the continuous-time route where Bellman reasoning becomes an HJB equation
the partially observed route where planning happens over beliefs instead of directly observed state

3 How To Use This Module

The best first-pass path is:

start with Controlled Markov Models, Policies, and Cost Functionals
continue to Finite-Horizon Dynamic Programming and Backward Induction
move to Infinite-Horizon Value Functions, Bellman Equations, and Contractions
continue to Value Iteration, Policy Iteration, and Approximate Dynamic Programming
continue to Stochastic Linear Systems, LQG, and the Separation Principle
continue to Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition
finish with Partial Observability, Belief States, and RL/Control Bridges
keep Control and Dynamics nearby so the stochastic version stays connected to state-space intuition
keep Probability nearby whenever transition laws, expectation, or conditional randomness need re-grounding
use Information Theory as a nearby payoff zone once the module reaches uncertainty, partial observability, or communication constraints
use Applications > Machine Learning as the downstream bridge into RL-flavored sequential decision-making

The design goal is to make Bellman-style sequential reasoning feel natural before the module branches into finite-horizon DP, infinite-horizon fixed points, stochastic linear control, and RL bridges.

4 Core Concepts

Controlled Markov Models, Policies, and Cost Functionals: the opening page that explains the MDP viewpoint, how actions shape stochastic transitions, and how objectives turn trajectories into optimization problems.
Finite-Horizon Dynamic Programming and Backward Induction: the second page that explains Bellman recursion, principle of optimality, and the basic backward algorithm.
Infinite-Horizon Value Functions, Bellman Equations, and Contractions: the third page that shifts Bellman recursion into fixed-point language and explains why contraction matters.
Value Iteration, Policy Iteration, and Approximate Dynamic Programming: the fourth page that turns Bellman fixed points into exact algorithms and then into the approximation bridge.
Stochastic Linear Systems, LQG, and the Separation Principle: the fifth page that shows the structured linear-Gaussian case where stochastic dynamic programming reconnects to LQR and Kalman filtering.
Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition: the sixth page that turns Bellman reasoning into a continuous-time value-function PDE with drift and diffusion terms.
Partial Observability, Belief States, and RL/Control Bridges: the seventh page that explains how hidden state forces belief-state planning and creates the cleanest bridge from stochastic control into POMDPs and RL.

5 After First Pass

Once this first-pass spine feels comfortable, the strongest next directions are:

exact and approximate POMDP algorithms
risk-sensitive and robust stochastic control
decentralized and distributed control under uncertainty
modern RL papers that hide Bellman and filtering structure under learned function approximation

6 Applications

6.1 Sequential Decision-Making

This is the native language for repeated decisions under uncertainty.

6.2 Control Under Uncertainty

The module extends deterministic and linear control into stochastic systems, random disturbances, and observation-limited settings.

6.3 RL And Planning

Modern RL keeps rediscovering the same core objects through policies, value functions, Bellman recursions, exploration, and approximate planning.

7 Go Deeper By Topic

The strongest adjacent live pages are:

8 Optional Deeper Reading After First Pass

The strongest current references connected to this module are:

MIT 6.231: Dynamic Programming and Stochastic Control - official lecture-slide index spanning finite-horizon DP, stochastic shortest path, value iteration, policy iteration, and approximate dynamic programming. Checked 2026-04-25.
Stanford MS&E 235A / EE 283: Markov Decision Processes - official current course page for sequential decision under uncertainty with MDP modeling and algorithms. Checked 2026-04-25.
Stanford MS&E 235A lecture 1 - official current notes for MDP specification, transition probabilities, and examples. Checked 2026-04-25.
Stanford MS&E 235A lecture 3 - official current notes for reward functions and decision objectives. Checked 2026-04-25.
Stanford EE365: Stochastic Control - official course page for Bellman-style stochastic control, continuous-time formulations, and structured optimal control. Checked 2026-04-25.
Stanford AA228 / CS238 - official current course page for decision making under uncertainty, dynamic programming, and POMDP/RL-facing applications. Checked 2026-04-25.

9 Sources and Further Reading

MIT 6.231: Dynamic Programming and Stochastic Control - First pass - official lecture-slide index for the full stochastic-control and dynamic-programming arc. Checked 2026-04-25.
Stanford MS&E 235A / EE 283: Markov Decision Processes - First pass - official current course page emphasizing formulation, objectives, and algorithms for MDPs. Checked 2026-04-25.
Stanford MS&E 235A lecture 1 - First pass - official current notes for MDP specification and transition-law thinking. Checked 2026-04-25.
Stanford MS&E 235A lecture 3 - First pass - official current notes for reward and objective design. Checked 2026-04-25.
Stanford EE365: Stochastic Control - Second pass - official course page for continuous-time and structured stochastic-control viewpoints. Checked 2026-04-25.
Stanford AA228 / CS238 - Second pass - official current decision-making-under-uncertainty course page with applications and computational framing. Checked 2026-04-25.