How Do Diffusion Models Work? Explained Simply

Learn to reverse the mess

Diffusion has a clever training trick. Take a real photo and add a little random static, then a little more, then more — after enough steps it's pure noise, like an untuned TV. At each step the model is shown "here's the noisier version; what noise did I just add?" By learning to predict the noise, it learns how to remove it — how to walk the process backwards.

The two directions

Forward (training only): start from a real image, add noise step by step until it's pure static.
Reverse (generation): start from pure static, remove a little noise each step, and a new image appears.
The model never memorises photos — it learns the general skill of turning noise into something realistic.

A sculptor and a block of marble

A sculptor starts with a rough block and chips away, a little at a time, until a statue emerges. Diffusion starts with a block of noise and "chips away" the randomness step by step until an image emerges. It never carves the same statue twice — start from different noise and you get a different picture.

One key detail

The model doesn't predict the finished image in one shot — it predicts the small bit of noise to remove right now, and repeats. That patience is why diffusion images look so good. Next: watch that denoising happen, step by step.

Forward adds noise (training); reverse removes it, step by step (generation).

How diffusion models work

Learn to reverse the mess

The two directions

A sculptor and a block of marble

One key detail