Learn to reverse the mess
Diffusion has a clever training trick. Take a real photo and add a little random static, then a little more, then more — after enough steps it's pure noise, like an untuned TV. At each step the model is shown "here's the noisier version; what noise did I just add?" By learning to predict the noise, it learns how to remove it — how to walk the process backwards.
The two directions
- Forward (training only): start from a real image, add noise step by step until it's pure static.
- Reverse (generation): start from pure static, remove a little noise each step, and a new image appears.
- The model never memorises photos — it learns the general skill of turning noise into something realistic.
A sculptor and a block of marble
A sculptor starts with a rough block and chips away, a little at a time, until a statue emerges. Diffusion starts with a block of noise and "chips away" the randomness step by step until an image emerges. It never carves the same statue twice — start from different noise and you get a different picture.
One key detail
The model doesn't predict the finished image in one shot — it predicts the small bit of noise to remove right now, and repeats. That patience is why diffusion images look so good. Next: watch that denoising happen, step by step.