Stochastic Control and Dynamic Programming

How controlled Markov models, Bellman equations, dynamic programming, and stochastic optimal control connect sequential decision-making to control, information, and reinforcement learning.
Modified

April 26, 2026

Keywords

stochastic control, dynamic programming, Markov decision process, Bellman equation, reinforcement learning

1 Why This Module Matters

Control and dynamics already gave the site:

  • state-space models
  • feedback
  • estimation
  • optimal control for structured linear systems

But many sequential decision problems are noisier and more abstract than that.

They ask:

  • what if the next state is random?
  • what if decisions are made repeatedly under uncertainty?
  • what if value must be computed recursively over time?
  • what if the language of control and the language of RL are really describing the same underlying objects?

This module is where those questions get one clean answer.

It is the bridge from deterministic or structured control into Markov decision processes, Bellman equations, stochastic optimal control, and RL-facing sequential decision-making.

Prerequisites Control and Dynamics should come first. Probability matters because randomness lives in transition laws and observations. Optimization helps because policies, value functions, and cost-to-go objects are optimization objects.

Unlocks Dynamic programming, MDPs, value iteration, policy iteration, stochastic optimal control, POMDP intuition, RL bridges

Research Use Reading papers or courses on stochastic control, sequential decision-making, planning under uncertainty, approximate dynamic programming, and reinforcement learning

2 First Pass Through This Module

The intended first-pass spine for this module is:

  1. Controlled Markov Models, Policies, and Cost Functionals
  2. Finite-Horizon Dynamic Programming and Backward Induction
  3. Infinite-Horizon Value Functions, Bellman Equations, and Contractions
  4. Value Iteration, Policy Iteration, and Approximate Dynamic Programming
  5. Stochastic Linear Systems, LQG, and the Separation Principle
  6. Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition
  7. Partial Observability, Belief States, and RL/Control Bridges

The module has a full seven-page first-pass spine. Together these pages introduce:

  • the MDP viewpoint itself: state, action, transition law, policy, reward or cost
  • the principle of optimality
  • the Bellman recursion
  • backward induction as the first dynamic-programming algorithm
  • infinite-horizon value functions
  • Bellman fixed points and contraction intuition
  • value iteration and policy iteration as the first exact algorithmic layer
  • approximate dynamic programming as the bridge to larger-scale problems
  • the structured linear-Gaussian route where dynamic programming closes into Riccati and Kalman objects
  • the continuous-time route where Bellman reasoning becomes an HJB equation
  • the partially observed route where planning happens over beliefs instead of directly observed state

3 How To Use This Module

The best first-pass path is:

  1. start with Controlled Markov Models, Policies, and Cost Functionals
  2. continue to Finite-Horizon Dynamic Programming and Backward Induction
  3. move to Infinite-Horizon Value Functions, Bellman Equations, and Contractions
  4. continue to Value Iteration, Policy Iteration, and Approximate Dynamic Programming
  5. continue to Stochastic Linear Systems, LQG, and the Separation Principle
  6. continue to Continuous-Time Stochastic Control and Hamilton-Jacobi-Bellman Intuition
  7. finish with Partial Observability, Belief States, and RL/Control Bridges
  8. keep Control and Dynamics nearby so the stochastic version stays connected to state-space intuition
  9. keep Probability nearby whenever transition laws, expectation, or conditional randomness need re-grounding
  10. use Information Theory as a nearby payoff zone once the module reaches uncertainty, partial observability, or communication constraints
  11. use Applications > Machine Learning as the downstream bridge into RL-flavored sequential decision-making

The design goal is to make Bellman-style sequential reasoning feel natural before the module branches into finite-horizon DP, infinite-horizon fixed points, stochastic linear control, and RL bridges.

4 Core Concepts

5 After First Pass

Once this first-pass spine feels comfortable, the strongest next directions are:

  • exact and approximate POMDP algorithms
  • risk-sensitive and robust stochastic control
  • decentralized and distributed control under uncertainty
  • modern RL papers that hide Bellman and filtering structure under learned function approximation

6 Applications

6.1 Sequential Decision-Making

This is the native language for repeated decisions under uncertainty.

6.2 Control Under Uncertainty

The module extends deterministic and linear control into stochastic systems, random disturbances, and observation-limited settings.

6.3 RL And Planning

Modern RL keeps rediscovering the same core objects through policies, value functions, Bellman recursions, exploration, and approximate planning.

7 Go Deeper By Topic

The strongest adjacent live pages are:

8 Optional Deeper Reading After First Pass

The strongest current references connected to this module are:

9 Sources and Further Reading

Back to top