Sampling, Mixing, and MCMC for Inference
sampling, MCMC, mixing, posterior, inference
1 Application Snapshot
Sometimes a point estimate is not enough.
You may want:
- posterior means
- uncertainty intervals
- marginal probabilities
- samples from plausible latent explanations
If the posterior is hard to compute exactly, one strategy is:
- build a Markov chain whose stationary distribution is the target posterior
- run it long enough
- use the resulting trajectory to approximate expectations and uncertainty summaries
That is the MCMC viewpoint.
2 Problem Setting
Suppose your target is a posterior distribution
\[ p(x\mid y), \]
but direct integration, normalization, or exact sampling is too expensive.
Instead of trying to compute every quantity analytically, you design a Markov chain
\[ X_0, X_1, X_2, \dots \]
whose stationary distribution is the target:
\[ \pi(x)=p(x\mid y). \]
Then posterior expectations such as
\[ \mathbb{E}_{\pi}[f(X)] \]
are approximated by long-run averages:
\[ \frac{1}{T}\sum_{t=1}^T f(X_t). \]
So the core inference task changes from:
compute the posterior exactly
to:
simulate a chain whose long-run behavior reveals posterior quantities
3 Why This Math Appears
This page ties together several site modules:
Statistics: posterior distributions and uncertainty summariesStochastic Processes: Markov chains, mixing, and ergodic averagesOptimization and Inference: inference goals that need more than a point estimateInformation Theory: approximation quality and uncertainty tradeoffsHigh-Dimensional Statistics: posterior computation in large structured models
So sampling-based inference is not only a computational trick. It is one of the main ways probability becomes usable when exact closed forms fail.
4 Math Objects In Use
- target posterior distribution \(\pi(x)=p(x\mid y)\)
- Markov chain transition rule
- stationary distribution
- mixing behavior
- ergodic averages
- posterior observable \(f(x)\) whose expectation or marginal you want
At the application level, the critical questions are:
- does the chain target the right distribution?
- does it mix fast enough to be useful?
- are the resulting samples informative enough for the quantity you care about?
5 A Small Worked Walkthrough
Suppose a posterior over a latent variable \(x\) is too complicated to summarize analytically.
You still want:
- a posterior mean
- a credible interval
- a sense of multimodality or uncertainty spread
MAP estimation gives only one mode. Variational inference gives an optimized approximate family. MCMC instead tries to explore the target distribution itself through a chain.
The workflow becomes:
Choose a chain constructionFor example, a Metropolis-Hastings or Gibbs-style update rule.Run the chainLet it move through plausible hidden-variable configurations.Estimate posterior quantitiesUse long-run averages, histograms, or empirical quantiles from the trajectory.
This is why MCMC is often the answer when the real question is not:
what is the single best hidden explanation?
but instead:
what does the whole posterior landscape look like?
6 Implementation or Computation Note
The main computational choices here are:
Target designWhat posterior or unnormalized density are you really trying to sample from?Chain designHow do you propose moves so the stationary distribution is correct?Mixing diagnosisAre samples still too dependent, or has the chain explored enough of the target?
Strong next bridges already live on the site:
7 Failure Modes
- checking only whether the stationary distribution is correct while ignoring whether the chain mixes too slowly
- treating early highly correlated samples as if they were independent posterior draws
- using MCMC when the real downstream goal only needs a simple point estimate
- forgetting that poor proposal geometry can make the chain computationally useless in high dimension
- reporting posterior summaries without asking whether the chain actually explored the target distribution well
8 Paper Bridge
- Stats366 Markov Chains Notes -
First pass- useful once Markov-chain dynamics become part of the inference story. Checked2026-04-26. - 6.262 / Discrete Stochastic Processes -
Bridge to stochastic-processes tooling- useful once mixing and long-run averages become computational questions. Checked2026-04-26.
9 Sources and Further Reading
- 18.445 / Introduction to Stochastic Processes -
First pass- official MIT anchor for mixing and long-run Markov-chain behavior. Checked2026-04-26. - 6.262 / Discrete Stochastic Processes -
First pass- official MIT anchor for Markov-chain modeling and probabilistic recursion. Checked2026-04-26. - Stats366 Course Page -
Second pass- Stanford anchor for hidden-state and Markov-model computation. Checked2026-04-26. - Stats366 Markov Chains Notes -
Second pass- Stanford notes useful for the Markov-chain side of MCMC intuition. Checked2026-04-26. - Stats366 Underlying Algorithms -
Algorithm bridge- Stanford notes helpful for recursive and chain-based inference viewpoints. Checked2026-04-26.