Two networks, one contest
Picture a forger trying to paint fake money and a detective trying to spot it. Every round, the forger makes a batch of fakes; the detective judges each one real or fake and gets told the truth. The forger uses that feedback to fool the detective next time; the detective uses it to catch sharper fakes. Round after round, both improve — and the forger ends up making fakes good enough to pass.
One training round
- The generator turns random noise into a fake image.
- The discriminator sees a mix of real and fake images and guesses which is which.
- Both learn from the score: the generator to fool better, the discriminator to catch better.
Fast, sharp — but temperamental
GANs were the state of the art for years and produce crisp images fast (one pass, no slow steps). But the two-network fight is delicate: if the generator finds one type of fake that always fools the detective, it makes only that — a failure called mode collapse (imagine the forger only ever painting the same one banknote). Getting the balance right is finicky, which is one reason diffusion later took over.
Try it
Below, step the forger-vs-detective game and watch both scores climb as they learn from each other.