Sampling, Mixing, and MCMC for Inference

A bridge page showing how posterior inference can be approximated by building a Markov chain whose long-run behavior matches the target distribution.
Modified

April 26, 2026

Keywords

sampling, MCMC, mixing, posterior, inference

1 Application Snapshot

Sometimes a point estimate is not enough.

You may want:

  • posterior means
  • uncertainty intervals
  • marginal probabilities
  • samples from plausible latent explanations

If the posterior is hard to compute exactly, one strategy is:

  • build a Markov chain whose stationary distribution is the target posterior
  • run it long enough
  • use the resulting trajectory to approximate expectations and uncertainty summaries

That is the MCMC viewpoint.

2 Problem Setting

Suppose your target is a posterior distribution

\[ p(x\mid y), \]

but direct integration, normalization, or exact sampling is too expensive.

Instead of trying to compute every quantity analytically, you design a Markov chain

\[ X_0, X_1, X_2, \dots \]

whose stationary distribution is the target:

\[ \pi(x)=p(x\mid y). \]

Then posterior expectations such as

\[ \mathbb{E}_{\pi}[f(X)] \]

are approximated by long-run averages:

\[ \frac{1}{T}\sum_{t=1}^T f(X_t). \]

So the core inference task changes from:

compute the posterior exactly

to:

simulate a chain whose long-run behavior reveals posterior quantities

3 Why This Math Appears

This page ties together several site modules:

  • Statistics: posterior distributions and uncertainty summaries
  • Stochastic Processes: Markov chains, mixing, and ergodic averages
  • Optimization and Inference: inference goals that need more than a point estimate
  • Information Theory: approximation quality and uncertainty tradeoffs
  • High-Dimensional Statistics: posterior computation in large structured models

So sampling-based inference is not only a computational trick. It is one of the main ways probability becomes usable when exact closed forms fail.

4 Math Objects In Use

  • target posterior distribution \(\pi(x)=p(x\mid y)\)
  • Markov chain transition rule
  • stationary distribution
  • mixing behavior
  • ergodic averages
  • posterior observable \(f(x)\) whose expectation or marginal you want

At the application level, the critical questions are:

  • does the chain target the right distribution?
  • does it mix fast enough to be useful?
  • are the resulting samples informative enough for the quantity you care about?

5 A Small Worked Walkthrough

Suppose a posterior over a latent variable \(x\) is too complicated to summarize analytically.

You still want:

  • a posterior mean
  • a credible interval
  • a sense of multimodality or uncertainty spread

MAP estimation gives only one mode. Variational inference gives an optimized approximate family. MCMC instead tries to explore the target distribution itself through a chain.

The workflow becomes:

  1. Choose a chain construction For example, a Metropolis-Hastings or Gibbs-style update rule.

  2. Run the chain Let it move through plausible hidden-variable configurations.

  3. Estimate posterior quantities Use long-run averages, histograms, or empirical quantiles from the trajectory.

This is why MCMC is often the answer when the real question is not:

what is the single best hidden explanation?

but instead:

what does the whole posterior landscape look like?

6 Implementation or Computation Note

The main computational choices here are:

  1. Target design What posterior or unnormalized density are you really trying to sample from?

  2. Chain design How do you propose moves so the stationary distribution is correct?

  3. Mixing diagnosis Are samples still too dependent, or has the chain explored enough of the target?

Strong next bridges already live on the site:

7 Failure Modes

  • checking only whether the stationary distribution is correct while ignoring whether the chain mixes too slowly
  • treating early highly correlated samples as if they were independent posterior draws
  • using MCMC when the real downstream goal only needs a simple point estimate
  • forgetting that poor proposal geometry can make the chain computationally useless in high dimension
  • reporting posterior summaries without asking whether the chain actually explored the target distribution well

8 Paper Bridge

9 Sources and Further Reading

Back to top