It's autocomplete, scaled up enormously
You already know a tiny language model: the autocomplete on your phone. Type "see you" and it suggests "later." An LLM is that same idea trained on a very large slice of the internet, books, and code — so instead of finishing a text message, it can finish an essay, write code, or answer a question. Under the hood it is always doing the same move: given the words so far, what word most likely comes next?
The developer's version
Think of an LLM as a function: text in, text out. You don't call methods on it; you describe what you want in plain language (the "prompt") and it returns a best-guess continuation. There is no database query, no lookup — the answer is generated one token at a time from patterns it learned during training.
Three things to remember
- An LLM predicts the next token (a token is roughly a word or word-piece), over and over, to build a response.
- Its "knowledge" is baked in during training — like a snapshot taken on a certain date. It does not automatically know today's news or your private documents.
- It is confident by design. It will produce a fluent answer even when it is wrong — which is exactly the problem the next lesson is about.
Why this matters for RAG
RAG exists to fix the last two points — the frozen, private-blind knowledge and the confident guessing. To understand RAG, you first need to feel why an LLM alone is not enough. That's next.