Sampling, Mixing, and MCMC for Inference

A bridge page showing how posterior inference can be approximated by building a Markov chain whose long-run behavior matches the target distribution.

Modified

April 26, 2026

Keywords

sampling, MCMC, mixing, posterior, inference

1 Application Snapshot

Sometimes a point estimate is not enough.

You may want:

posterior means
uncertainty intervals
marginal probabilities
samples from plausible latent explanations

If the posterior is hard to compute exactly, one strategy is:

build a Markov chain whose stationary distribution is the target posterior
run it long enough
use the resulting trajectory to approximate expectations and uncertainty summaries

That is the MCMC viewpoint.

2 Problem Setting

Suppose your target is a posterior distribution

\[ p(x\mid y), \]

but direct integration, normalization, or exact sampling is too expensive.

Instead of trying to compute every quantity analytically, you design a Markov chain

\[ X_0, X_1, X_2, \dots \]

whose stationary distribution is the target:

\[ \pi(x)=p(x\mid y). \]

Then posterior expectations such as

\[ \mathbb{E}_{\pi}[f(X)] \]

are approximated by long-run averages:

\[ \frac{1}{T}\sum_{t=1}^T f(X_t). \]

So the core inference task changes from:

compute the posterior exactly

to:

simulate a chain whose long-run behavior reveals posterior quantities

3 Why This Math Appears

This page ties together several site modules:

Statistics: posterior distributions and uncertainty summaries
Stochastic Processes: Markov chains, mixing, and ergodic averages
Optimization and Inference: inference goals that need more than a point estimate
Information Theory: approximation quality and uncertainty tradeoffs
High-Dimensional Statistics: posterior computation in large structured models

So sampling-based inference is not only a computational trick. It is one of the main ways probability becomes usable when exact closed forms fail.

4 Math Objects In Use

target posterior distribution \(\pi(x)=p(x\mid y)\)
Markov chain transition rule
stationary distribution
mixing behavior
ergodic averages
posterior observable \(f(x)\) whose expectation or marginal you want

At the application level, the critical questions are:

does the chain target the right distribution?
does it mix fast enough to be useful?
are the resulting samples informative enough for the quantity you care about?

5 A Small Worked Walkthrough

Suppose a posterior over a latent variable \(x\) is too complicated to summarize analytically.

You still want:

a posterior mean
a credible interval
a sense of multimodality or uncertainty spread

MAP estimation gives only one mode. Variational inference gives an optimized approximate family. MCMC instead tries to explore the target distribution itself through a chain.

The workflow becomes:

Choose a chain construction For example, a Metropolis-Hastings or Gibbs-style update rule.
Run the chain Let it move through plausible hidden-variable configurations.
Estimate posterior quantities Use long-run averages, histograms, or empirical quantiles from the trajectory.

This is why MCMC is often the answer when the real question is not:

what is the single best hidden explanation?

but instead:

what does the whole posterior landscape look like?

6 Implementation or Computation Note

The main computational choices here are:

Target design What posterior or unnormalized density are you really trying to sample from?
Chain design How do you propose moves so the stationary distribution is correct?
Mixing diagnosis Are samples still too dependent, or has the chain explored enough of the target?

Strong next bridges already live on the site:

7 Failure Modes

checking only whether the stationary distribution is correct while ignoring whether the chain mixes too slowly
treating early highly correlated samples as if they were independent posterior draws
using MCMC when the real downstream goal only needs a simple point estimate
forgetting that poor proposal geometry can make the chain computationally useless in high dimension
reporting posterior summaries without asking whether the chain actually explored the target distribution well

8 Paper Bridge

Stats366 Markov Chains Notes - First pass - useful once Markov-chain dynamics become part of the inference story. Checked 2026-04-26.
6.262 / Discrete Stochastic Processes - Bridge to stochastic-processes tooling - useful once mixing and long-run averages become computational questions. Checked 2026-04-26.

9 Sources and Further Reading

18.445 / Introduction to Stochastic Processes - First pass - official MIT anchor for mixing and long-run Markov-chain behavior. Checked 2026-04-26.
6.262 / Discrete Stochastic Processes - First pass - official MIT anchor for Markov-chain modeling and probabilistic recursion. Checked 2026-04-26.
Stats366 Course Page - Second pass - Stanford anchor for hidden-state and Markov-model computation. Checked 2026-04-26.
Stats366 Markov Chains Notes - Second pass - Stanford notes useful for the Markov-chain side of MCMC intuition. Checked 2026-04-26.
Stats366 Underlying Algorithms - Algorithm bridge - Stanford notes helpful for recursive and chain-based inference viewpoints. Checked 2026-04-26.