Markov Chains and Stationary Distributions

How discrete-time Markov chains model memory through the present state, and how stationary distributions capture long-run stochastic balance.
Modified

April 26, 2026

Keywords

Markov chain, stationary distribution, transition matrix, irreducible, mixing

1 Role

This is the first page of the Stochastic Processes module.

Its job is to introduce the cleanest first stochastic process:

  • a system whose future evolution depends on the present state
  • but not on the full earlier history once that present state is known

This is the point where ordinary probability becomes probability over time.

2 First-Pass Promise

Read this page first in the module.

If you stop here, you should still understand:

  • what the Markov property is
  • how a transition matrix describes one-step evolution
  • what a stationary distribution means
  • why some chains forget their starting point and others do not

3 Why It Matters

Many random systems are too complicated to model by remembering the whole past.

The Markov idea is the first major simplification:

  • keep exactly the information the future still needs
  • compress the rest of the past into the current state

That is why Markov chains keep reappearing in:

  • queues and inventory systems
  • hidden-state models
  • random walks on graphs
  • stochastic control and RL
  • MCMC and sampling algorithms

Once a reader sees that, the next natural question is not only what happens in one step. It is:

what happens after many steps?

That is where stationary distributions enter.

4 Prerequisite Recall

  • from Probability, conditional probability tells us how to update uncertainty once current information is known
  • from Linear Algebra, a matrix can act on a vector of probabilities and move a distribution forward in time
  • from stochastic-control language nearby, a state is useful when it stores the memory the future still needs from the past

5 Intuition

5.1 The Present State Is A Sufficient Summary

The Markov property does not say the past is irrelevant.

It says the state has been chosen so that the relevant effect of the past is already encoded in the present.

So if the current state is known, earlier history gives no extra predictive power for the next step.

5.2 The Transition Matrix Moves Distributions

For a finite-state chain, the transition matrix P tells us one-step transition probabilities.

If the current distribution of the chain is a row vector \mu, then after one step the distribution becomes

\[ \mu P. \]

So Markov-chain evolution can be read both:

  • pathwise, as random state-to-state motion
  • distributionally, as repeated application of an operator

5.3 A Stationary Distribution Is Not A Frozen State

A stationary distribution \pi satisfies

\[ \pi P = \pi. \]

This does not mean the chain stops moving.

It means:

  • if the chain starts distributed as \pi
  • then every later time has the same marginal distribution \pi

So stationarity is balance in distribution, not immobility of individual sample paths.

5.4 Long-Run Convergence Needs Structural Conditions

Some chains have many closed classes. Some oscillate periodically. Some have a unique long-run distribution and forget where they started.

That is why words like:

  • irreducible
  • aperiodic
  • recurrent

matter.

They separate chains with clean long-run behavior from chains where stationarity is more subtle.

6 Formal Core

Definition 1 (Definition: Discrete-Time Markov Chain) A discrete-time stochastic process (X_0, X_1, X_2, \dots) on a finite or countable state space S is a Markov chain if

\[ \mathbb{P}(X_{t+1}=j \mid X_t=i, X_{t-1}, \dots, X_0) = \mathbb{P}(X_{t+1}=j \mid X_t=i) \]

for all times t and states i,j.

In words:

  • once the present state is known, the next state no longer depends on the earlier path

Definition 2 (Definition: Transition Matrix) For a finite-state Markov chain, the transition matrix P has entries

\[ P_{ij} = \mathbb{P}(X_{t+1}=j \mid X_t=i). \]

Each row sums to 1, because from the current state the next state must land somewhere.

Definition 3 (Definition: Stationary Distribution) A probability vector \pi is stationary if

\[ \pi P = \pi. \]

This means the distribution is invariant under one-step evolution.

Theorem 1 (Theorem Idea: Irreducible Aperiodic Finite Chains Forget The Start) For a finite-state Markov chain, if the chain is irreducible and aperiodic, then there is a unique stationary distribution \pi, and the distribution of X_t converges to \pi as t \to \infty, regardless of the initial state.

At first pass, the main point is:

  • good connectivity plus no forced oscillation often yields clean long-run behavior

7 Worked Example

Consider a two-state chain with states A and B and transition matrix

\[ P = \begin{pmatrix} 0.9 & 0.1 \\ 0.4 & 0.6 \end{pmatrix}. \]

Interpretation:

  • from A, the chain stays in A with probability 0.9
  • from B, the chain jumps to A with probability 0.4

Let the stationary distribution be

\[ \pi = (\pi_A, \pi_B). \]

The stationarity equation \pi P = \pi gives

\[ \pi_A = 0.9\pi_A + 0.4\pi_B. \]

So

\[ 0.1\pi_A = 0.4\pi_B \quad\Rightarrow\quad \pi_A = 4\pi_B. \]

Using \pi_A + \pi_B = 1, we get

\[ \pi = (0.8, 0.2). \]

So in the long run, the chain spends about 80% of its time in A and 20% in B.

This does not mean every four steps out of five are exactly in A.

It means the long-run distribution settles around that balance.

8 Computation Lens

When you meet a Markov-chain model, ask:

  1. what exactly is the state?
  2. is the chain finite-state, countable-state, or already drifting toward continuous-time modeling?
  3. do we care about one-step prediction, hitting probabilities, or long-run behavior?
  4. is there a unique stationary distribution, or can the chain split into different closed classes?
  5. is the chain mixing quickly, slowly, or not at all from some starting states?

Those questions usually reveal whether the real object is:

  • short-term prediction
  • equilibrium behavior
  • rare-event timing
  • or sampling from a target distribution

9 Application Lens

9.1 Queueing And Networks

A queue length, congestion state, or node occupancy can often be modeled as a Markov chain.

9.2 Hidden-State Models

Latent-state models and HMM-style systems often build their temporal backbone from Markov transitions.

9.3 Stochastic Control And RL

Markov chains become controlled once actions begin changing the transition law.

9.4 MCMC And Sampling

Many sampling algorithms deliberately build a Markov chain whose stationary distribution is the distribution we want to sample from.

10 Stop Here For First Pass

If you stop here, retain these five ideas:

  • a Markov chain is a random process whose next step depends on the present state, not the full earlier history
  • the transition matrix moves probability distributions forward in time
  • a stationary distribution is an invariant distribution, not a fixed state
  • long-run behavior depends on structural properties like irreducibility and aperiodicity
  • Markov chains are the entry point into stochastic control, hidden-state models, diffusion thinking, and MCMC

11 Go Deeper

The strongest adjacent live pages right now are:

The natural next page in this module is:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

13 Sources and Further Reading

Back to top