Likelihoods, Priors, and MAP Estimation
likelihood, prior, MAP, regularization, inference
1 Application Snapshot
Once an inference problem has observations and a hidden target, the next question is:
how should evidence from the data be combined with assumptions about what hidden solutions are plausible?
The two main pieces are:
- a
likelihood, which scores how well a hidden candidate explains the data - a
prior, which encodes which hidden candidates look more plausible before seeing the data
MAP estimation is the bridge that turns those two pieces into a concrete optimization problem.
2 Problem Setting
Suppose the hidden quantity is \(x\) and the observed data are \(y\).
The likelihood is the model
\[ p(y \mid x), \]
which says how probable the observations would be if \(x\) were the truth.
The prior is
\[ p(x), \]
which says which values of \(x\) are more plausible before observing \(y\).
Bayes’ rule combines them:
\[ p(x \mid y) \propto p(y \mid x)\,p(x). \]
If you want a single best posterior mode instead of the whole posterior distribution, you get the MAP estimator:
\[ \hat{x}_{\mathrm{MAP}} = \arg\max_x p(x \mid y) = \arg\max_x p(y \mid x)p(x). \]
Taking negative logs turns that into an optimization problem:
\[ \hat{x}_{\mathrm{MAP}} = \arg\min_x \bigl[-\log p(y \mid x) - \log p(x)\bigr]. \]
3 Why This Math Appears
This page sits exactly at the intersection of several site modules:
Statistics: likelihoods, posteriors, Bayesian estimationOptimization: objectives, constraints, convexity, regularizationHigh-Dimensional Statistics: sparsity assumptions and structured recoverySignal Processing and Estimation: noisy measurements and inverse problemsInformation Theory: priors and penalties as ways of controlling uncertainty and description complexity
So MAP estimation is not a niche Bayesian trick. It is one of the cleanest ways to translate probabilistic modeling into a numerical objective you can actually solve.
4 Math Objects In Use
- hidden variable, parameter, or signal \(x\)
- observation \(y\)
- likelihood \(p(y \mid x)\)
- prior \(p(x)\)
- posterior \(p(x \mid y)\)
- negative log-likelihood as a data-fit term
- negative log-prior as a penalty or regularizer
This is why the same optimization template appears across many applications:
\[ \min_x \Bigl[\text{data fit}(x) + \lambda\,\text{regularizer}(x)\Bigr]. \]
In many cases, the regularizer is just the negative log-prior up to constants and scaling.
5 A Small Worked Walkthrough
Take the simple noisy observation model
\[ y = x + \eta, \qquad \eta \sim \mathcal{N}(0,\sigma^2). \]
Then the likelihood says:
\[ p(y \mid x) \propto \exp\!\left(-\frac{(y-x)^2}{2\sigma^2}\right). \]
Now suppose the prior is also Gaussian:
\[ x \sim \mathcal{N}(0,\tau^2). \]
Then
\[ p(x) \propto \exp\!\left(-\frac{x^2}{2\tau^2}\right). \]
The MAP problem becomes
$$ _{} = _x . $$
This is already the shape of a regularized objective:
- the first term fits the data
- the second term shrinks solutions toward values preferred by the prior
If the prior were Laplace instead of Gaussian, the penalty would become proportional to \(|x|\), which is the same geometry that later reappears in sparse recovery and l1 regularization.
So one of the most important application translations is:
- Gaussian noise often leads to squared loss
- Gaussian priors often lead to
l2penalties - Laplace priors often lead to
l1penalties
6 Implementation or Computation Note
In practice, three decisions matter immediately:
Model choiceWhat noise model makes sense for the measurement process?Structure choiceWhat prior or regularizer expresses what you believe about the hidden quantity?Computation choiceIs the resulting objective easy to optimize, or will you need approximation or sampling?
The strongest next pages after this one are:
7 Failure Modes
- confusing
MLE,MAP, and full Bayesian posterior inference - treating the prior as decoration instead of a real structural assumption
- forgetting that a bad likelihood model can dominate everything downstream
- interpreting the MAP point estimate as if it contained the same information as the whole posterior
- tuning penalties numerically without asking what prior belief or recovery bias they actually encode
8 Paper Bridge
- STATS 305B / Applied Statistics II -
First pass- useful once regularization and posterior-mode estimation start to merge. Checked2026-04-26. - EE278 / Introduction to Statistical Signal Processing -
Bridge to estimation- useful once likelihood modeling and noisy observations become the main bottleneck. Checked2026-04-26.
9 Sources and Further Reading
- STATS 202 / Data Mining and Analysis -
First pass- official Stanford bridge for estimation viewpoints and modeling choices. Checked2026-04-26. - STATS 305B / Applied Statistics II -
First pass- official Stanford anchor for regularization and modern estimation language. Checked2026-04-26. - STATS 305B LASSO Notes -
Second pass- official Stanford notes that make the penalty-as-structure viewpoint explicit. Checked2026-04-26. - 6.011 / Signals, Systems and Inference -
Bridge to noisy measurements- official MIT course anchor for inference from corrupted observations. Checked2026-04-26. - 16.322 / Stochastic Estimation and Control -
Bridge to model-based estimation- official MIT source for hidden-state estimation and Bayesian filtering viewpoints. Checked2026-04-26.