Mixing, Ergodicity, and MCMC Bridges
mixing time, ergodicity, MCMC, stationary distribution, Markov chains
1 Role
This is a later live page in the Stochastic Processes module.
Its job is to explain what happens after a Markov chain has been defined:
- does it forget where it started?
- do long-run empirical averages settle down?
- can we exploit that long-run behavior to sample from hard distributions?
This is where stochastic processes become a bridge to MCMC and inference.
2 First-Pass Promise
You can read this page on its own inside the full module spine.
If you stop here, you should still understand:
- what mixing means at first pass
- what ergodicity means for long-run averages
- why stationary distributions alone are not enough
- why MCMC is really a controlled use of Markov-chain long-run behavior
3 Why It Matters
A stationary distribution is only the first half of the story.
It tells us what distribution is preserved by the dynamics.
But practical questions are harder:
- if we start far away, how long until the chain looks approximately stationary?
- if we average along one random trajectory, do we recover stationary expectations?
- if we design a chain to target a posterior distribution, when does that actually help us sample?
That is why mixing and ergodicity matter.
They turn stationary-distribution algebra into usable long-run behavior.
4 Prerequisite Recall
- from Probability, expectation is the quantity we ultimately want to estimate
- from Markov Chains and Stationary Distributions, a stationary distribution
\pisatisfies\pi P = \pi - a chain can have the correct stationary distribution and still be a poor computational tool if it mixes too slowly
5 Intuition
5.1 Mixing Means Forgetting The Start
At first pass, mixing means:
after enough steps, the law of X_t no longer remembers the initial state very much
So if two chains start from different places but run long enough, their distributions become close.
This is the operational meaning of “approaching stationarity.”
5.2 Ergodicity Turns One Long Run Into An Expectation
Suppose f(X_t) is an observable along the chain.
Ergodicity says that under the right conditions,
\[ \frac{1}{T}\sum_{t=1}^T f(X_t) \]
approaches the stationary expectation
\[ \mathbb{E}_{\pi}[f(X)]. \]
This is the core reason a single long run can be useful.
5.3 Correct Stationary Distribution Is Not The Whole Story
Two chains may have the same stationary distribution but very different usefulness.
One may:
- mix quickly
- explore the state space efficiently
while the other:
- gets stuck for long periods
- keeps strong dependence between nearby samples
That is why MCMC quality is not only about correctness in the limit.
It is also about how fast that limit becomes relevant.
5.4 MCMC Designs A Chain Whose Stationary Law Is The Target
In MCMC, we do not sample directly from a hard target distribution \pi.
Instead, we build a Markov chain whose stationary distribution is \pi.
Then the long-run behavior of the chain is used to approximate:
- expectations under
\pi - marginal probabilities
- posterior summaries
So MCMC is really a stochastic-process trick:
sampling by controlled long-run dynamics
6 Formal Core
Definition 1 (Definition: Mixing Intuition) At a first pass, a Markov chain mixes if its distribution at time t becomes close to the stationary distribution as t grows, and does so uniformly enough that the initial state eventually matters little.
You do not need the exact metric at first pass, but total variation distance is the usual formal tool.
Definition 2 (Definition: Ergodicity Intuition) At a first pass, ergodicity means that long-run empirical averages along one trajectory converge to stationary expectations.
This is the bridge from dynamics to Monte Carlo estimation.
Theorem 1 (Theorem Idea: Ergodic Averages) For a well-behaved irreducible aperiodic finite-state Markov chain with stationary distribution \pi,
\[ \frac{1}{T}\sum_{t=1}^T f(X_t) \to \mathbb{E}_{\pi}[f(X)] \]
for suitable observables f.
At first pass, the message is simple:
- one long dependent run can still estimate stationary expectations
- but the quality of that estimate depends on how the chain mixes
Definition 3 (Definition: MCMC) Markov chain Monte Carlo is the strategy of building a Markov chain whose stationary distribution is a target distribution \pi, then using the chain’s long-run samples or averages to approximate quantities under \pi.
At first pass, Metropolis-Hastings and Gibbs sampling are just standard ways to enforce the right stationary law.
7 Worked Example
Take the two-state chain from the Markov page, with stationary distribution
\[ \pi = (0.8, 0.2). \]
Let
\[ f(0)=0, \qquad f(1)=1. \]
Then the stationary expectation is
\[ \mathbb{E}_{\pi}[f(X)] = 0.2. \]
If the chain is ergodic, then a long-run average like
\[ \frac{1}{T}\sum_{t=1}^T f(X_t) \]
should settle near 0.2.
So even though successive states are dependent, the long run still estimates the stationary fraction of time spent in state 1.
This is the exact logic reused by MCMC:
- choose a chain with the right stationary law
- run it long enough to reduce dependence on the start
- use long-run averages to estimate expectations under the target
8 Computation Lens
When you meet an MCMC or long-run Markov-chain argument, ask:
- what is the target stationary distribution?
- why does the chain preserve that distribution?
- how quickly does the chain forget its initial state?
- are adjacent samples still strongly dependent?
- is the goal exact sampling, approximate sampling, or expectation estimation?
Those questions usually reveal whether the real issue is:
- correctness in the limit
- practical mixing speed
- or estimation error from dependent samples
9 Application Lens
9.1 Bayesian Computation
MCMC is a standard response when posterior distributions are known up to a normalizing constant but direct sampling is hard.
9.2 Statistical Physics And Energy-Based Models
Long-run chain behavior is the natural route when the target law is defined by local energies or acceptance rules rather than direct draws.
9.3 High-Dimensional Statistics And ML
Sampling-based inference, posterior approximation, and some generative procedures all depend on whether a chain actually explores its target efficiently.
10 Stop Here For First Pass
If you stop here, retain these five ideas:
- mixing means the chain gradually forgets its initial state
- ergodicity means long-run averages converge to stationary expectations
- having the right stationary distribution is not enough if mixing is too slow
- MCMC works by designing a chain whose stationary law is the target
- dependence between samples is the central practical cost of MCMC
11 Go Deeper
The strongest adjacent live pages right now are:
- Martingales and Optional Stopping Intuition
- Markov Chains and Stationary Distributions
- High-Dimensional Statistics
- Information-Theoretic Lower Bounds in Statistics, Learning, and Communication
This page now sits at the long-run end of the full first-pass spine.
12 Optional Deeper Reading After First Pass
The strongest current references connected to this page are:
- MIT 18.445 lecture 4 - official MIT note introducing Markov-chain mixing. Checked
2026-04-25. - MIT 18.445 lecture 7 - official MIT summary note on mixing times. Checked
2026-04-25. - Stanford Stats366 - official Stanford course page that explicitly places Markov chains next to Bayesian estimation and MCMC. Checked
2026-04-25. - Stanford Stats366 Markov chains notes - public Stanford note for a concise stationary-distribution and finite-state Markov-chain recap. Checked
2026-04-25.
13 Sources and Further Reading
- MIT 18.445 lecture notes page -
First pass- official MIT lecture hub covering mixing as part of the stochastic-processes route. Checked2026-04-25. - MIT 18.445 lecture 4 -
First pass- official MIT introduction to Markov-chain mixing. Checked2026-04-25. - MIT 18.445 lecture 7 -
Second pass- official MIT summary note on mixing times. Checked2026-04-25. - MIT 6.262 Discrete Stochastic Processes -
Second pass- official MIT course hub for discrete-time chains and long-run state-space behavior. Checked2026-04-25. - Stanford Stats366 -
Bridge outward- official Stanford course page that explicitly places MCMC in a Markov-chain-and-inference curriculum. Checked2026-04-25. - Stanford Stats366 Markov chains notes -
Bridge outward- useful Stanford note for first-pass stationary and transition-matrix language. Checked2026-04-25.