Diffusion Models and Denoising

A bridge page showing how diffusion models learn to reverse a gradual noising process, why denoising becomes generation, and where probability and optimization meet.

Modified

April 26, 2026

Keywords

diffusion models, denoising, DDPM, score-based models, generative AI

1 Application Snapshot

Diffusion models generate data by learning how to undo noise.

The high-level picture is:

gradually corrupt real data with random noise
train a model to predict or remove that noise
run the learned denoising process in reverse to generate new samples

So generation becomes a repeated denoising problem.

2 Problem Setting

Start with a clean sample \(x_0\).

The forward diffusion process gradually adds noise:

\[ x_t = \sqrt{1-\beta_t}\,x_{t-1} + \sqrt{\beta_t}\,\varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0,I). \]

As \(t\) increases, the sample becomes more corrupted until it is close to pure noise.

The learning problem is then:

given a noisy sample x_t and the noise level t, predict the noise or predict how to denoise

If the model can do that reliably, we can approximately reverse the noising process and turn noise back into data.

3 Why This Math Appears

Diffusion models sit at an interesting intersection of earlier foundations:

Probability: the forward and reverse processes are stochastic, and Gaussian noise is central
Statistics: the model is trained from noisy examples and a denoising objective
Optimization: training is still gradient-based minimization of a loss
Backpropagation and Computation Graphs: the denoiser is a large differentiable model trained by backprop

So diffusion models are not just “fancy image generators.” They are probabilistic iterative denoisers learned with standard deep-learning optimization.

4 Math Objects In Use

clean sample \(x_0\)
noisy sample \(x_t\)
noise schedule \(\beta_t\)
Gaussian noise \(\varepsilon\)
denoiser or noise-prediction network \(\varepsilon_\theta(x_t,t)\)
reverse sampling process

5 A Small Worked Walkthrough

Take a simple scalar example with

\[ x_0 = 2, \qquad \beta_1 = 0.25. \]

If one noise draw is \(\varepsilon_1 = 1\), then

\[ x_1 = \sqrt{0.75}\cdot 2 + \sqrt{0.25}\cdot 1 \approx 1.732 + 0.5 = 2.232. \]

The sample is now noisy.

Suppose at a later step the model sees \((x_t,t)\) and predicts the added noise \(\hat{\varepsilon}_\theta(x_t,t)\). Then it can use that prediction to estimate which part of \(x_t\) is signal and which part is noise.

The exact sampling update depends on the formulation, but the conceptual point is stable:

if the model can predict the noise well
then it can move the sample slightly back toward the data manifold
and repeated denoising steps can turn random noise into a realistic sample

So the core learning problem is not “draw a whole image in one shot.” It is “take one noisy step and learn how to clean it up.”

6 Implementation or Computation Note

Diffusion models are powerful partly because the training target is local and stable:

predict noise
or predict a denoised direction
at many noise levels

But sampling is iterative, so generation can be slower than one-shot models. This leads to practical design questions:

how many denoising steps to use
what noise schedule to choose
whether to parameterize the model by noise, score, or velocity
how to trade off sample quality and speed

This is one reason current diffusion research often mixes probability, numerical-method ideas, and architectural engineering.

The next natural page after this one is Score Matching and the SDE View of Diffusion, which explains the score field the model is learning and why reverse-time dynamics can generate samples.

7 Failure Modes

thinking the model memorizes clean images directly rather than learning denoising transitions
treating every diffusion implementation as mathematically identical
ignoring the role of the timestep or noise schedule
assuming denoising quality at one step automatically means fast high-quality sampling overall
forgetting that the model produces a distributional generation process, not only a deterministic decoder

8 Paper Bridge

Denoising Diffusion Probabilistic Models - First pass - foundational DDPM paper for the discrete noising and denoising view. Checked 2026-04-24.
Score-Based Generative Modeling through Stochastic Differential Equations - Paper bridge - continuous-time view connecting diffusion to score modeling and reverse SDEs. Checked 2026-04-24.

9 Sources and Further Reading

What are Diffusion Models? - First pass - short official Stanford HAI explanation of the noising-and-denoising story. Checked 2026-04-24.
CME296: Diffusion and Large Vision Models - First pass - current official Stanford course listing showing how diffusion now sits inside a modern generative-model curriculum. Checked 2026-04-24.
Denoising Diffusion Probabilistic Models - First pass - foundational primary source for DDPMs. Checked 2026-04-24.
Score-Based Generative Modeling through Stochastic Differential Equations - Second pass - primary source for the continuous-time score-based view. Checked 2026-04-24.
Diffusion Models Beat GANs on Image Synthesis - Paper bridge - influential paper showing how diffusion methods became competitive at large-scale image generation. Checked 2026-04-24.