Likelihoods, Priors, and MAP Estimation

A bridge page showing how observation models, priors, and posterior modes turn inference questions into optimization problems.
Modified

April 26, 2026

Keywords

likelihood, prior, MAP, regularization, inference

1 Application Snapshot

Once an inference problem has observations and a hidden target, the next question is:

how should evidence from the data be combined with assumptions about what hidden solutions are plausible?

The two main pieces are:

  • a likelihood, which scores how well a hidden candidate explains the data
  • a prior, which encodes which hidden candidates look more plausible before seeing the data

MAP estimation is the bridge that turns those two pieces into a concrete optimization problem.

2 Problem Setting

Suppose the hidden quantity is \(x\) and the observed data are \(y\).

The likelihood is the model

\[ p(y \mid x), \]

which says how probable the observations would be if \(x\) were the truth.

The prior is

\[ p(x), \]

which says which values of \(x\) are more plausible before observing \(y\).

Bayes’ rule combines them:

\[ p(x \mid y) \propto p(y \mid x)\,p(x). \]

If you want a single best posterior mode instead of the whole posterior distribution, you get the MAP estimator:

\[ \hat{x}_{\mathrm{MAP}} = \arg\max_x p(x \mid y) = \arg\max_x p(y \mid x)p(x). \]

Taking negative logs turns that into an optimization problem:

\[ \hat{x}_{\mathrm{MAP}} = \arg\min_x \bigl[-\log p(y \mid x) - \log p(x)\bigr]. \]

3 Why This Math Appears

This page sits exactly at the intersection of several site modules:

  • Statistics: likelihoods, posteriors, Bayesian estimation
  • Optimization: objectives, constraints, convexity, regularization
  • High-Dimensional Statistics: sparsity assumptions and structured recovery
  • Signal Processing and Estimation: noisy measurements and inverse problems
  • Information Theory: priors and penalties as ways of controlling uncertainty and description complexity

So MAP estimation is not a niche Bayesian trick. It is one of the cleanest ways to translate probabilistic modeling into a numerical objective you can actually solve.

4 Math Objects In Use

  • hidden variable, parameter, or signal \(x\)
  • observation \(y\)
  • likelihood \(p(y \mid x)\)
  • prior \(p(x)\)
  • posterior \(p(x \mid y)\)
  • negative log-likelihood as a data-fit term
  • negative log-prior as a penalty or regularizer

This is why the same optimization template appears across many applications:

\[ \min_x \Bigl[\text{data fit}(x) + \lambda\,\text{regularizer}(x)\Bigr]. \]

In many cases, the regularizer is just the negative log-prior up to constants and scaling.

5 A Small Worked Walkthrough

Take the simple noisy observation model

\[ y = x + \eta, \qquad \eta \sim \mathcal{N}(0,\sigma^2). \]

Then the likelihood says:

\[ p(y \mid x) \propto \exp\!\left(-\frac{(y-x)^2}{2\sigma^2}\right). \]

Now suppose the prior is also Gaussian:

\[ x \sim \mathcal{N}(0,\tau^2). \]

Then

\[ p(x) \propto \exp\!\left(-\frac{x^2}{2\tau^2}\right). \]

The MAP problem becomes

$$ _{} = _x . $$

This is already the shape of a regularized objective:

  • the first term fits the data
  • the second term shrinks solutions toward values preferred by the prior

If the prior were Laplace instead of Gaussian, the penalty would become proportional to \(|x|\), which is the same geometry that later reappears in sparse recovery and l1 regularization.

So one of the most important application translations is:

  • Gaussian noise often leads to squared loss
  • Gaussian priors often lead to l2 penalties
  • Laplace priors often lead to l1 penalties

6 Implementation or Computation Note

In practice, three decisions matter immediately:

  1. Model choice What noise model makes sense for the measurement process?

  2. Structure choice What prior or regularizer expresses what you believe about the hidden quantity?

  3. Computation choice Is the resulting objective easy to optimize, or will you need approximation or sampling?

The strongest next pages after this one are:

7 Failure Modes

  • confusing MLE, MAP, and full Bayesian posterior inference
  • treating the prior as decoration instead of a real structural assumption
  • forgetting that a bad likelihood model can dominate everything downstream
  • interpreting the MAP point estimate as if it contained the same information as the whole posterior
  • tuning penalties numerically without asking what prior belief or recovery bias they actually encode

8 Paper Bridge

9 Sources and Further Reading

Back to top