Stochastic Control and Dynamic Programming
stochastic control, dynamic programming, Markov decision process, Bellman equation, reinforcement learning
1 Why This Module Matters
Control and dynamics already gave the site:
- state-space models
- feedback
- estimation
- optimal control for structured linear systems
But many sequential decision problems are noisier and more abstract than that.
They ask:
- what if the next state is random?
- what if decisions are made repeatedly under uncertainty?
- what if value must be computed recursively over time?
- what if the language of control and the language of RL are really describing the same underlying objects?
This module is where those questions get one clean answer.
It is the bridge from deterministic or structured control into Markov decision processes, Bellman equations, stochastic optimal control, and RL-facing sequential decision-making.
2 First Pass Through This Module
The intended first-pass spine for this module is:
- Controlled Markov Models, Policies, and Cost Functionals
- Finite-Horizon Dynamic Programming and Backward Induction
- Infinite-Horizon Value Functions, Bellman Equations, and Contractions
- Value Iteration, Policy Iteration, and Approximate Dynamic Programming
- Stochastic Linear Systems, LQG, and the Separation Principle
- Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition
- Partial Observability, Belief States, and RL/Control Bridges
The module has a full seven-page first-pass spine. Together these pages introduce:
- the MDP viewpoint itself: state, action, transition law, policy, reward or cost
- the principle of optimality
- the Bellman recursion
- backward induction as the first dynamic-programming algorithm
- infinite-horizon value functions
- Bellman fixed points and contraction intuition
- value iteration and policy iteration as the first exact algorithmic layer
- approximate dynamic programming as the bridge to larger-scale problems
- the structured linear-Gaussian route where dynamic programming closes into Riccati and Kalman objects
- the continuous-time route where Bellman reasoning becomes an HJB equation
- the partially observed route where planning happens over beliefs instead of directly observed state
3 How To Use This Module
The best first-pass path is:
- start with Controlled Markov Models, Policies, and Cost Functionals
- continue to Finite-Horizon Dynamic Programming and Backward Induction
- move to Infinite-Horizon Value Functions, Bellman Equations, and Contractions
- continue to Value Iteration, Policy Iteration, and Approximate Dynamic Programming
- continue to Stochastic Linear Systems, LQG, and the Separation Principle
- continue to Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition
- finish with Partial Observability, Belief States, and RL/Control Bridges
- keep Control and Dynamics nearby so the stochastic version stays connected to state-space intuition
- keep Probability nearby whenever transition laws, expectation, or conditional randomness need re-grounding
- use Information Theory as a nearby payoff zone once the module reaches uncertainty, partial observability, or communication constraints
- use Applications > Machine Learning as the downstream bridge into RL-flavored sequential decision-making
The design goal is to make Bellman-style sequential reasoning feel natural before the module branches into finite-horizon DP, infinite-horizon fixed points, stochastic linear control, and RL bridges.
4 Core Concepts
- Controlled Markov Models, Policies, and Cost Functionals: the opening page that explains the MDP viewpoint, how actions shape stochastic transitions, and how objectives turn trajectories into optimization problems.
- Finite-Horizon Dynamic Programming and Backward Induction: the second page that explains Bellman recursion, principle of optimality, and the basic backward algorithm.
- Infinite-Horizon Value Functions, Bellman Equations, and Contractions: the third page that shifts Bellman recursion into fixed-point language and explains why contraction matters.
- Value Iteration, Policy Iteration, and Approximate Dynamic Programming: the fourth page that turns Bellman fixed points into exact algorithms and then into the approximation bridge.
- Stochastic Linear Systems, LQG, and the Separation Principle: the fifth page that shows the structured linear-Gaussian case where stochastic dynamic programming reconnects to LQR and Kalman filtering.
- Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition: the sixth page that turns Bellman reasoning into a continuous-time value-function PDE with drift and diffusion terms.
- Partial Observability, Belief States, and RL/Control Bridges: the seventh page that explains how hidden state forces belief-state planning and creates the cleanest bridge from stochastic control into POMDPs and RL.
5 After First Pass
Once this first-pass spine feels comfortable, the strongest next directions are:
- exact and approximate POMDP algorithms
- risk-sensitive and robust stochastic control
- decentralized and distributed control under uncertainty
- modern RL papers that hide Bellman and filtering structure under learned function approximation
6 Applications
6.1 Sequential Decision-Making
This is the native language for repeated decisions under uncertainty.
6.2 Control Under Uncertainty
The module extends deterministic and linear control into stochastic systems, random disturbances, and observation-limited settings.
6.3 RL And Planning
Modern RL keeps rediscovering the same core objects through policies, value functions, Bellman recursions, exploration, and approximate planning.
7 Go Deeper By Topic
The strongest adjacent live pages are:
8 Optional Deeper Reading After First Pass
The strongest current references connected to this module are:
- MIT 6.231: Dynamic Programming and Stochastic Control - official lecture-slide index spanning finite-horizon DP, stochastic shortest path, value iteration, policy iteration, and approximate dynamic programming. Checked
2026-04-25. - Stanford MS&E 235A / EE 283: Markov Decision Processes - official current course page for sequential decision under uncertainty with MDP modeling and algorithms. Checked
2026-04-25. - Stanford MS&E 235A lecture 1 - official current notes for MDP specification, transition probabilities, and examples. Checked
2026-04-25. - Stanford MS&E 235A lecture 3 - official current notes for reward functions and decision objectives. Checked
2026-04-25. - Stanford EE365: Stochastic Control - official course page for Bellman-style stochastic control, continuous-time formulations, and structured optimal control. Checked
2026-04-25. - Stanford AA228 / CS238 - official current course page for decision making under uncertainty, dynamic programming, and POMDP/RL-facing applications. Checked
2026-04-25.
9 Sources and Further Reading
- MIT 6.231: Dynamic Programming and Stochastic Control -
First pass- official lecture-slide index for the full stochastic-control and dynamic-programming arc. Checked2026-04-25. - Stanford MS&E 235A / EE 283: Markov Decision Processes -
First pass- official current course page emphasizing formulation, objectives, and algorithms for MDPs. Checked2026-04-25. - Stanford MS&E 235A lecture 1 -
First pass- official current notes for MDP specification and transition-law thinking. Checked2026-04-25. - Stanford MS&E 235A lecture 3 -
First pass- official current notes for reward and objective design. Checked2026-04-25. - Stanford EE365: Stochastic Control -
Second pass- official course page for continuous-time and structured stochastic-control viewpoints. Checked2026-04-25. - Stanford AA228 / CS238 -
Second pass- official current decision-making-under-uncertainty course page with applications and computational framing. Checked2026-04-25.