Infinite-Horizon Value Functions, Bellman Equations, and Contractions

How discounted infinite-horizon control problems turn Bellman equations into fixed-point problems with contraction structure.
Modified

April 26, 2026

Keywords

infinite horizon, Bellman equation, contraction mapping, discounted cost, value function

1 Role

This is the third page of the Stochastic Control and Dynamic Programming module.

Its job is to make one major shift:

  • finite horizon solved by backward induction
  • infinite horizon solved by a fixed-point equation

This is where Bellman recursion stops being just a stage-by-stage algorithm and becomes an operator equation on value functions.

2 First-Pass Promise

Read this page after Finite-Horizon Dynamic Programming and Backward Induction.

If you stop here, you should still understand:

  • what changes when the horizon is infinite
  • why discounted value functions stay finite
  • what the Bellman optimality equation means
  • why contraction is the key theorem idea behind uniqueness and iteration

3 Why It Matters

Many real decision problems do not have a clean terminal time.

They repeat:

  • every day
  • every control cycle
  • every customer arrival
  • every environment step in RL

So there is no final stage where we can start a backward recursion.

Instead, the environment is often stationary:

  • same state space
  • same action space
  • same transition law
  • same one-step cost rule

That means the problem at the next step looks like the same problem again.

So the unknown object is no longer a sequence of functions J_0^*,J_1^*,...,J_T^*.

It is one value function that must be consistent with itself under one-step optimization and future continuation.

That consistency condition is the Bellman equation.

4 Prerequisite Recall

  • in finite horizon, the value function at time t summarizes the optimal future cost from (t,s)
  • Bellman recursion replaces a long policy search with repeated one-step optimization plus expected future value
  • expectation matters because one action induces a distribution over next states
  • a discount factor \gamma with 0<\gamma<1 shrinks the effect of far-future costs

5 Intuition

5.1 The Horizon No Longer Tells Us Where To Start

For a finite-horizon problem, the terminal time gave us an anchor.

For an infinite-horizon problem, there is no last step.

So we cannot solve by going backward from the end.

5.2 Discounting Keeps The Future Under Control

If future costs are discounted by \gamma^t, then events very far ahead matter less and less.

At first pass, this does two useful things:

  • it keeps the total value finite under bounded one-step costs
  • it makes the Bellman update pull two candidate value functions closer together

That second point is where contraction comes from.

5.3 The Same Problem Reappears After One Step

In a stationary discounted problem, once we move one step ahead, we face the same kind of optimization problem again.

So the future can be summarized by the same value function V^*, not by a different function for each time index.

5.4 Bellman Now Means Fixed Point

The Bellman update takes a candidate value function and produces a refined one.

The correct optimal value function is the one that is unchanged by this update.

That is why infinite-horizon dynamic programming is naturally a fixed-point problem.

5.5 Contraction Explains Why Iteration Works

If the Bellman operator shrinks sup-norm distance by a factor at most \gamma, then:

  • there is a unique fixed point
  • repeated Bellman updates converge to it

This is the mathematical backbone behind value iteration.

6 Formal Core

At first pass, work with a discounted infinite-horizon cost-minimization problem.

Definition 1 (Definition: Infinite-Horizon Policy Value) For a policy \pi, define

\[ V^\pi(s)=\mathbb{E}^\pi\!\left[\sum_{t=0}^\infty \gamma^t g(S_t,A_t,S_{t+1}) \,\middle|\, S_0=s\right], \]

where 0<\gamma<1.

This is the expected discounted total future cost when the current state is s.

Definition 2 (Definition: Optimal Value Function) The optimal value function is

\[ V^*(s)=\inf_\pi V^\pi(s). \]

It records the best achievable expected discounted cost from each state.

Theorem 1 (Theorem Idea: Bellman Optimality Equation) For a discounted stationary control problem,

\[ V^*(s)=\min_{a\in A(s)} \mathbb{E}\!\left[g(s,a,S') + \gamma V^*(S') \mid S=s, A=a\right]. \]

This equation says:

  • choose the action that minimizes
  • immediate expected cost
  • plus discounted optimal future value

Definition 3 (Definition: Bellman Operator) Define the Bellman optimality operator T by

\[ (TV)(s)=\min_{a\in A(s)} \mathbb{E}\!\left[g(s,a,S') + \gamma V(S') \mid S=s, A=a\right]. \]

Then the Bellman equation is just

\[ V^*=TV^*. \]

Theorem 2 (Theorem Idea: Contraction) If one-step costs are bounded and 0<\gamma<1, then T is a contraction in the sup norm:

\[ \|TV-TW\|_\infty \le \gamma \|V-W\|_\infty. \]

At first pass, the meaning is more important than the proof:

  • Bellman updates cannot separate two candidate value functions by more than a factor \gamma
  • repeated updates become more and more stable

Theorem 3 (Theorem Idea: Uniqueness And Greedy Policy Extraction) Because T is a contraction, it has a unique fixed point V^*. Any policy that chooses an action attaining the minimum in the Bellman equation is optimal.

7 Worked Example

Consider a simple discounted machine-maintenance problem with states:

  • G = machine is good
  • B = machine is bad

Suppose:

  • in state G, the only action is to keep running
  • in state B, choose either repair or wait
  • discount factor is \gamma

Costs and transitions:

  • from G, current cost is 0, and the next state is G with probability 0.8 and B with probability 0.2
  • from B, action repair costs 4 and moves to G
  • from B, action wait costs 1 and keeps the machine in B

Then the Bellman equations are

\[ V^*(G)=\gamma\left(0.8V^*(G)+0.2V^*(B)\right), \]

and

\[ V^*(B)=\min\left\{4+\gamma V^*(G),\ 1+\gamma V^*(B)\right\}. \]

The main first-pass lesson is not solving the algebra perfectly.

It is seeing the structure:

  • one equation per state
  • each equation compares one-step action choices
  • future consequences re-enter only through the same unknown value function

This is the infinite-horizon Bellman viewpoint in miniature.

8 Computation Lens

When you meet an infinite-horizon Bellman problem, ask:

  1. is the problem discounted, average-cost, or something else?
  2. what is the state?
  3. what is the one-step cost?
  4. what operator maps one candidate value function to the next?
  5. what theorem guarantees this operator is well behaved?

For first-pass work, discounted problems with bounded costs are the cleanest because contraction gives a full stability story.

9 Application Lens

9.1 Repeated Decision Problems

Inventory control, maintenance, queueing, and online service systems often look stationary and effectively infinite-horizon.

9.2 RL Foundations

Discounted MDPs are the standard mathematical backbone for value functions, Bellman equations, and many RL algorithms.

9.3 Planning Under Uncertainty

The fixed-point viewpoint is what makes long-run stochastic planning computationally meaningful rather than combinatorially impossible.

10 Stop Here For First Pass

If you stop here, retain these five ideas:

  • finite horizon gives a sequence of value functions; infinite horizon often gives one stationary value function
  • discounting makes long-run cost finite and mathematically stable
  • the Bellman equation says optimal value must equal one-step optimal cost plus discounted future value
  • the Bellman operator is the right object, not just the equation written once
  • contraction is the theorem idea that explains uniqueness and convergence of iteration

11 Go Deeper

The strongest next page is:

The strongest adjacent live pages are:

12 Optional Deeper Reading After First Pass

13 Sources and Further Reading

Back to top