Markov Chains and Stationary Distributions
Markov chain, stationary distribution, transition matrix, irreducible, mixing
1 Role
This is the first page of the Stochastic Processes module.
Its job is to introduce the cleanest first stochastic process:
- a system whose future evolution depends on the present state
- but not on the full earlier history once that present state is known
This is the point where ordinary probability becomes probability over time.
2 First-Pass Promise
Read this page first in the module.
If you stop here, you should still understand:
- what the Markov property is
- how a transition matrix describes one-step evolution
- what a stationary distribution means
- why some chains forget their starting point and others do not
3 Why It Matters
Many random systems are too complicated to model by remembering the whole past.
The Markov idea is the first major simplification:
- keep exactly the information the future still needs
- compress the rest of the past into the current state
That is why Markov chains keep reappearing in:
- queues and inventory systems
- hidden-state models
- random walks on graphs
- stochastic control and RL
- MCMC and sampling algorithms
Once a reader sees that, the next natural question is not only what happens in one step. It is:
what happens after many steps?
That is where stationary distributions enter.
4 Prerequisite Recall
- from Probability, conditional probability tells us how to update uncertainty once current information is known
- from Linear Algebra, a matrix can act on a vector of probabilities and move a distribution forward in time
- from stochastic-control language nearby, a state is useful when it stores the memory the future still needs from the past
5 Intuition
5.1 The Present State Is A Sufficient Summary
The Markov property does not say the past is irrelevant.
It says the state has been chosen so that the relevant effect of the past is already encoded in the present.
So if the current state is known, earlier history gives no extra predictive power for the next step.
5.2 The Transition Matrix Moves Distributions
For a finite-state chain, the transition matrix P tells us one-step transition probabilities.
If the current distribution of the chain is a row vector \mu, then after one step the distribution becomes
\[ \mu P. \]
So Markov-chain evolution can be read both:
- pathwise, as random state-to-state motion
- distributionally, as repeated application of an operator
5.3 A Stationary Distribution Is Not A Frozen State
A stationary distribution \pi satisfies
\[ \pi P = \pi. \]
This does not mean the chain stops moving.
It means:
- if the chain starts distributed as
\pi - then every later time has the same marginal distribution
\pi
So stationarity is balance in distribution, not immobility of individual sample paths.
5.4 Long-Run Convergence Needs Structural Conditions
Some chains have many closed classes. Some oscillate periodically. Some have a unique long-run distribution and forget where they started.
That is why words like:
- irreducible
- aperiodic
- recurrent
matter.
They separate chains with clean long-run behavior from chains where stationarity is more subtle.
6 Formal Core
Definition 1 (Definition: Discrete-Time Markov Chain) A discrete-time stochastic process (X_0, X_1, X_2, \dots) on a finite or countable state space S is a Markov chain if
\[ \mathbb{P}(X_{t+1}=j \mid X_t=i, X_{t-1}, \dots, X_0) = \mathbb{P}(X_{t+1}=j \mid X_t=i) \]
for all times t and states i,j.
In words:
- once the present state is known, the next state no longer depends on the earlier path
Definition 2 (Definition: Transition Matrix) For a finite-state Markov chain, the transition matrix P has entries
\[ P_{ij} = \mathbb{P}(X_{t+1}=j \mid X_t=i). \]
Each row sums to 1, because from the current state the next state must land somewhere.
Definition 3 (Definition: Stationary Distribution) A probability vector \pi is stationary if
\[ \pi P = \pi. \]
This means the distribution is invariant under one-step evolution.
Theorem 1 (Theorem Idea: Irreducible Aperiodic Finite Chains Forget The Start) For a finite-state Markov chain, if the chain is irreducible and aperiodic, then there is a unique stationary distribution \pi, and the distribution of X_t converges to \pi as t \to \infty, regardless of the initial state.
At first pass, the main point is:
- good connectivity plus no forced oscillation often yields clean long-run behavior
7 Worked Example
Consider a two-state chain with states A and B and transition matrix
\[ P = \begin{pmatrix} 0.9 & 0.1 \\ 0.4 & 0.6 \end{pmatrix}. \]
Interpretation:
- from
A, the chain stays inAwith probability0.9 - from
B, the chain jumps toAwith probability0.4
Let the stationary distribution be
\[ \pi = (\pi_A, \pi_B). \]
The stationarity equation \pi P = \pi gives
\[ \pi_A = 0.9\pi_A + 0.4\pi_B. \]
So
\[ 0.1\pi_A = 0.4\pi_B \quad\Rightarrow\quad \pi_A = 4\pi_B. \]
Using \pi_A + \pi_B = 1, we get
\[ \pi = (0.8, 0.2). \]
So in the long run, the chain spends about 80% of its time in A and 20% in B.
This does not mean every four steps out of five are exactly in A.
It means the long-run distribution settles around that balance.
8 Computation Lens
When you meet a Markov-chain model, ask:
- what exactly is the state?
- is the chain finite-state, countable-state, or already drifting toward continuous-time modeling?
- do we care about one-step prediction, hitting probabilities, or long-run behavior?
- is there a unique stationary distribution, or can the chain split into different closed classes?
- is the chain mixing quickly, slowly, or not at all from some starting states?
Those questions usually reveal whether the real object is:
- short-term prediction
- equilibrium behavior
- rare-event timing
- or sampling from a target distribution
9 Application Lens
9.1 Queueing And Networks
A queue length, congestion state, or node occupancy can often be modeled as a Markov chain.
9.3 Stochastic Control And RL
Markov chains become controlled once actions begin changing the transition law.
9.4 MCMC And Sampling
Many sampling algorithms deliberately build a Markov chain whose stationary distribution is the distribution we want to sample from.
10 Stop Here For First Pass
If you stop here, retain these five ideas:
- a Markov chain is a random process whose next step depends on the present state, not the full earlier history
- the transition matrix moves probability distributions forward in time
- a stationary distribution is an invariant distribution, not a fixed state
- long-run behavior depends on structural properties like irreducibility and aperiodicity
- Markov chains are the entry point into stochastic control, hidden-state models, diffusion thinking, and MCMC
11 Go Deeper
The strongest adjacent live pages right now are:
- Probability
- Controlled Markov Models, Policies, and Cost Functionals
- Infinite-Horizon Value Functions, Bellman Equations, and Contractions
- State Estimation, Smoothing, and Hidden-State Inference
- Information-Theoretic Lower Bounds in Statistics, Learning, and Communication
The natural next page in this module is:
12 Optional Deeper Reading After First Pass
The strongest current references connected to this page are:
- MIT 18.445 lecture notes page - official MIT route through finite Markov chains, stationary distributions, and mixing. Checked
2026-04-25. - MIT 18.445 lecture 2 - official MIT note focused directly on stationary distributions. Checked
2026-04-25. - MIT 6.262 Discrete Stochastic Processes - official MIT course hub for discrete stochastic-process intuition and countable-state behavior. Checked
2026-04-25. - Stanford Stats366 Markov chains notes - concise Stanford note for first-pass Markov-chain definitions and stationary behavior. Checked
2026-04-25.
13 Sources and Further Reading
- MIT 18.445 lecture notes page -
First pass- official MIT lecture hub for Markov chains, stationary distributions, mixing, martingales, and Poisson processes. Checked2026-04-25. - MIT 18.445 lecture 2 -
First pass- official MIT note on stationary distributions for finite Markov chains. Checked2026-04-25. - MIT 6.262 Discrete Stochastic Processes -
Second pass- official MIT course hub for discrete-time stochastic-process language. Checked2026-04-25. - MIT 6.262 chapter 5 resource -
Second pass- official MIT resource for countable-state Markov-chain ideas. Checked2026-04-25. - Stanford Stats366 Markov chains notes -
Bridge outward- useful Stanford note for discrete-time chains and stationarity. Checked2026-04-25.