Infinite-Horizon Value Functions, Bellman Equations, and Contractions
infinite horizon, Bellman equation, contraction mapping, discounted cost, value function
1 Role
This is the third page of the Stochastic Control and Dynamic Programming module.
Its job is to make one major shift:
- finite horizon solved by backward induction
- infinite horizon solved by a fixed-point equation
This is where Bellman recursion stops being just a stage-by-stage algorithm and becomes an operator equation on value functions.
2 First-Pass Promise
Read this page after Finite-Horizon Dynamic Programming and Backward Induction.
If you stop here, you should still understand:
- what changes when the horizon is infinite
- why discounted value functions stay finite
- what the Bellman optimality equation means
- why contraction is the key theorem idea behind uniqueness and iteration
3 Why It Matters
Many real decision problems do not have a clean terminal time.
They repeat:
- every day
- every control cycle
- every customer arrival
- every environment step in RL
So there is no final stage where we can start a backward recursion.
Instead, the environment is often stationary:
- same state space
- same action space
- same transition law
- same one-step cost rule
That means the problem at the next step looks like the same problem again.
So the unknown object is no longer a sequence of functions J_0^*,J_1^*,...,J_T^*.
It is one value function that must be consistent with itself under one-step optimization and future continuation.
That consistency condition is the Bellman equation.
4 Prerequisite Recall
- in finite horizon, the value function at time
tsummarizes the optimal future cost from(t,s) - Bellman recursion replaces a long policy search with repeated one-step optimization plus expected future value
- expectation matters because one action induces a distribution over next states
- a discount factor
\gammawith0<\gamma<1shrinks the effect of far-future costs
5 Intuition
5.1 The Horizon No Longer Tells Us Where To Start
For a finite-horizon problem, the terminal time gave us an anchor.
For an infinite-horizon problem, there is no last step.
So we cannot solve by going backward from the end.
5.2 Discounting Keeps The Future Under Control
If future costs are discounted by \gamma^t, then events very far ahead matter less and less.
At first pass, this does two useful things:
- it keeps the total value finite under bounded one-step costs
- it makes the Bellman update pull two candidate value functions closer together
That second point is where contraction comes from.
5.3 The Same Problem Reappears After One Step
In a stationary discounted problem, once we move one step ahead, we face the same kind of optimization problem again.
So the future can be summarized by the same value function V^*, not by a different function for each time index.
5.4 Bellman Now Means Fixed Point
The Bellman update takes a candidate value function and produces a refined one.
The correct optimal value function is the one that is unchanged by this update.
That is why infinite-horizon dynamic programming is naturally a fixed-point problem.
5.5 Contraction Explains Why Iteration Works
If the Bellman operator shrinks sup-norm distance by a factor at most \gamma, then:
- there is a unique fixed point
- repeated Bellman updates converge to it
This is the mathematical backbone behind value iteration.
6 Formal Core
At first pass, work with a discounted infinite-horizon cost-minimization problem.
Definition 1 (Definition: Infinite-Horizon Policy Value) For a policy \pi, define
\[ V^\pi(s)=\mathbb{E}^\pi\!\left[\sum_{t=0}^\infty \gamma^t g(S_t,A_t,S_{t+1}) \,\middle|\, S_0=s\right], \]
where 0<\gamma<1.
This is the expected discounted total future cost when the current state is s.
Definition 2 (Definition: Optimal Value Function) The optimal value function is
\[ V^*(s)=\inf_\pi V^\pi(s). \]
It records the best achievable expected discounted cost from each state.
Theorem 1 (Theorem Idea: Bellman Optimality Equation) For a discounted stationary control problem,
\[ V^*(s)=\min_{a\in A(s)} \mathbb{E}\!\left[g(s,a,S') + \gamma V^*(S') \mid S=s, A=a\right]. \]
This equation says:
- choose the action that minimizes
- immediate expected cost
- plus discounted optimal future value
Definition 3 (Definition: Bellman Operator) Define the Bellman optimality operator T by
\[ (TV)(s)=\min_{a\in A(s)} \mathbb{E}\!\left[g(s,a,S') + \gamma V(S') \mid S=s, A=a\right]. \]
Then the Bellman equation is just
\[ V^*=TV^*. \]
Theorem 2 (Theorem Idea: Contraction) If one-step costs are bounded and 0<\gamma<1, then T is a contraction in the sup norm:
\[ \|TV-TW\|_\infty \le \gamma \|V-W\|_\infty. \]
At first pass, the meaning is more important than the proof:
- Bellman updates cannot separate two candidate value functions by more than a factor
\gamma - repeated updates become more and more stable
Theorem 3 (Theorem Idea: Uniqueness And Greedy Policy Extraction) Because T is a contraction, it has a unique fixed point V^*. Any policy that chooses an action attaining the minimum in the Bellman equation is optimal.
7 Worked Example
Consider a simple discounted machine-maintenance problem with states:
G= machine is goodB= machine is bad
Suppose:
- in state
G, the only action is to keep running - in state
B, choose eitherrepairorwait - discount factor is
\gamma
Costs and transitions:
- from
G, current cost is0, and the next state isGwith probability0.8andBwith probability0.2 - from
B, actionrepaircosts4and moves toG - from
B, actionwaitcosts1and keeps the machine inB
Then the Bellman equations are
\[ V^*(G)=\gamma\left(0.8V^*(G)+0.2V^*(B)\right), \]
and
\[ V^*(B)=\min\left\{4+\gamma V^*(G),\ 1+\gamma V^*(B)\right\}. \]
The main first-pass lesson is not solving the algebra perfectly.
It is seeing the structure:
- one equation per state
- each equation compares one-step action choices
- future consequences re-enter only through the same unknown value function
This is the infinite-horizon Bellman viewpoint in miniature.
8 Computation Lens
When you meet an infinite-horizon Bellman problem, ask:
- is the problem discounted, average-cost, or something else?
- what is the state?
- what is the one-step cost?
- what operator maps one candidate value function to the next?
- what theorem guarantees this operator is well behaved?
For first-pass work, discounted problems with bounded costs are the cleanest because contraction gives a full stability story.
9 Application Lens
9.1 Repeated Decision Problems
Inventory control, maintenance, queueing, and online service systems often look stationary and effectively infinite-horizon.
9.2 RL Foundations
Discounted MDPs are the standard mathematical backbone for value functions, Bellman equations, and many RL algorithms.
9.3 Planning Under Uncertainty
The fixed-point viewpoint is what makes long-run stochastic planning computationally meaningful rather than combinatorially impossible.
10 Stop Here For First Pass
If you stop here, retain these five ideas:
- finite horizon gives a sequence of value functions; infinite horizon often gives one stationary value function
- discounting makes long-run cost finite and mathematically stable
- the Bellman equation says optimal value must equal one-step optimal cost plus discounted future value
- the Bellman operator is the right object, not just the equation written once
- contraction is the theorem idea that explains uniqueness and convergence of iteration
11 Go Deeper
The strongest next page is:
The strongest adjacent live pages are:
12 Optional Deeper Reading After First Pass
- MIT 6.231: Dynamic Programming and Stochastic Control - official lecture-slide index for finite and infinite-horizon dynamic programming. Checked
2026-04-25. - MIT 6.231 lecture 3 - official notes page covering discounted infinite-horizon dynamic programming. Checked
2026-04-25. - Stanford MS&E 235A notes index - official notes index for MDP formulation, Bellman equations, and value-iteration foundations. Checked
2026-04-25. - Stanford MS&E 235A lecture 4 - official current notes for discounted dynamic programming and Bellman fixed-point reasoning. Checked
2026-04-25. - Stanford AA228 / CS238 - official current course page connecting MDP value functions to planning and RL. Checked
2026-04-25.
13 Sources and Further Reading
- MIT 6.231: Dynamic Programming and Stochastic Control -
First pass- official lecture-slide index for the stochastic-control and dynamic-programming arc. Checked2026-04-25. - MIT 6.231 lecture 3 -
First pass- official notes page for discounted infinite-horizon Bellman equations and contraction-style reasoning. Checked2026-04-25. - Stanford MS&E 235A / EE 283: Markov Decision Processes -
First pass- official current course page for stochastic decision-making and dynamic programming. Checked2026-04-25. - Stanford MS&E 235A notes index -
First pass- official notes index for the course’s finite-horizon and infinite-horizon development. Checked2026-04-25. - Stanford MS&E 235A lecture 4 -
First pass- official current notes for discounted value functions and Bellman fixed points. Checked2026-04-25. - Stanford AA228 / CS238 -
Second pass- official current decision-making-under-uncertainty course page with planning and RL-facing framing. Checked2026-04-25.