What is an LLM? A Plain-English Explanation

It's autocomplete, scaled up enormously

You already know a tiny language model: the autocomplete on your phone. Type "see you" and it suggests "later." An LLM is that same idea trained on a very large slice of the internet, books, and code — so instead of finishing a text message, it can finish an essay, write code, or answer a question. Under the hood it is always doing the same move: given the words so far, what word most likely comes next?

The developer's version

Think of an LLM as a function: text in, text out. You don't call methods on it; you describe what you want in plain language (the "prompt") and it returns a best-guess continuation. There is no database query, no lookup — the answer is generated one token at a time from patterns it learned during training.

Three things to remember

An LLM predicts the next token (a token is roughly a word or word-piece), over and over, to build a response.
Its "knowledge" is baked in during training — like a snapshot taken on a certain date. It does not automatically know today's news or your private documents.
It is confident by design. It will produce a fluent answer even when it is wrong — which is exactly the problem the next lesson is about.

Why this matters for RAG

RAG exists to fix the last two points — the frozen, private-blind knowledge and the confident guessing. To understand RAG, you first need to feel why an LLM alone is not enough. That's next.

An LLM completes text by repeatedly predicting the most likely next token.