Prompt Engineering for Builders

Prompt Engineering for Builders — explained simply for developers.

Learn this interactively →
Basicsconcept

What exactly is a "prompt" when you're calling an AI model from your backend?

A prompt is just the text input you send to the model: the instructions plus any context plus the actual question or task. Think of it like the request body you POST to a REST endpoint: the model reads your prompt and returns text. There's no hidden state carried between calls (the model is stateless, like a pure function), so everything the model needs to know has to be IN the prompt every time. "Prompt engineering" is the craft of writing that text so you reliably get output your code can use. It's your main lever: you shape behavior by what you write, not by retraining anything.
#prompt#basics#input#stateless
Basicsconcept

Why is prompt engineering even necessary? Can't I just ask the model in plain English like I'd ask a coworker?

You can, and it often works. But a model doesn't share your unspoken context the way a coworker does: it can't see your database, your app's goals, or what "done" means to you. Vague asks give vague, inconsistent answers, which is painful when your code has to parse the result. Prompt engineering is mostly about being explicit: state the task, give the context, show the format you want. It's the same discipline as writing a clear API contract instead of hoping the caller guesses. A little structure up front saves you from flaky output and surprise bugs downstream.
#prompt#basics#specificity#reliability
Basicsconcept

What's the difference between a "system prompt" and a "user prompt"?

Most chat APIs split your input into roles. The system prompt sets the model's standing behavior: who it is, its rules, the format it should follow, its tone. The user prompt (role user) is the actual question or task for this turn. Analogy: the system prompt is like environment config or middleware you set once for the whole app; the user prompt is the per-request payload. In code it looks like { system: "You are a support assistant. Answer only from the docs provided.", messages: [{ role: "user", content: "How do I reset my password?" }] }. The system prompt carries more authority, so put your durable rules there.
#system-prompt#user-prompt#roles#basics
Basicsconcept

What's the difference between "zero-shot" and "few-shot" prompting?

"Shot" just means an example you show the model. Zero-shot means you give instructions and no examples, and the model figures it out ("Classify this review as positive or negative"). Few-shot means you include a handful of worked examples first, then the real input. It's like the difference between describing a function's behavior versus also giving a couple of input-to-output test cases. Few-shot shines when the task is fuzzy, the format is picky, or you keep getting near-misses. Start zero-shot because it's cheaper and simpler; add examples when the output isn't reliable enough. Two or three good examples usually do a lot.
#zero-shot#few-shot#examples#basics
Basicsconcept

Why does giving the model a role or persona ("You are a senior tax accountant") improve answers?

Setting a role primes the model toward the right kind of knowledge, vocabulary, and tone. "You are a senior tax accountant; explain this deduction to a small-business owner" tends to produce more accurate, appropriately-pitched answers than a bare "explain this deduction," because the role nudges the model's whole framing. It also lets you control voice and audience in one line. Put the role in the system prompt since it's standing behavior. Don't over-rely on it though: a persona shapes style and framing, but it doesn't give the model facts it doesn't have or make it infallible. Pair the role with real context and clear instructions; the role sets the stage, the rest does the work.
#persona#role#system-prompt#basics
Basicsconcept

What does it mean to ask an AI model for "structured output" like JSON, and why would a builder want it?

By default a model replies with prose: "Sure! I'd recommend the 400 Package because…". That's lovely for a human but useless to your code — you can't if-branch on a paragraph. Structured output means you tell the model to answer as JSON instead, e.g. {"pick":"400","price":3899}. Now your backend can JSON.parse() it and treat fields like any API response. Think of it as turning the model from a chatty support agent into a REST endpoint that returns a predictable JSON body your program can actually consume.
#json#structured-output#parsing#basics
Basicshow-to

How do you actually ask a model to return JSON — what goes in the prompt?

You tell it plainly and show the exact shape. In your system prompt: "Respond with ONLY valid JSON, no prose, in this exact shape: {\"pick\": string, \"price\": number, \"reason\": string}." Giving a concrete example beats describing it — the model copies the pattern. Two extras that help a lot: name every field you want (it won't invent good keys reliably), and say "no markdown, no ```json fences" so you don't get backticks wrapped around it. Many 2026 APIs also have a built-in JSON/structured-output mode or a schema parameter — prefer that when available; it's more reliable than asking nicely.
#json#prompting#how-to#schema
Core ideahow-to

What should I put in the system prompt versus the user prompt?

Put durable, request-independent stuff in the system prompt: the model's role, hard rules ("never reveal internal IDs"), the output format, tone, and any always-true context about your app. Put the variable, per-request stuff in the user message: the customer's actual question, the specific record you're asking about, this turn's data. Rule of thumb: if it's the same on every call, it's system; if it changes per request, it's user. Bonus: keeping the system prompt stable also helps with caching later (an unchanging prefix can be reused). Don't jam the user's raw question into the system prompt; that mixes config with payload.
#system-prompt#user-prompt#structure#design
Core ideahow-to

How do I make a prompt "specific"? What does giving good context actually look like?

Specific means you remove the guesswork. Instead of "summarize this," say "summarize this support ticket in 2 sentences for a manager, focusing on the customer's blocker." Spell out the role/audience, the exact task, length, format, and what to include or ignore. Then paste in the actual data the model needs: it can't see your DB, so if the answer depends on a record, include that record in the prompt. Compare "Is this refund allowed?" (model has no policy) versus "Given this refund policy: [...] and this order: [...], is a refund allowed? Answer yes or no with one reason." The second one your code can trust.
#specificity#context#instructions#best-practices
Core ideacode

How do I write a few-shot prompt? What does it look like in practice?

Show 2 to 4 examples in exactly the input-to-output shape you want the real answer in, then add the real input with the output left blank. Keep the examples consistent: same format, same style, and ideally cover the tricky cases (an edge case, a negative). For a classifier: `` Review: "Loved it, fast shipping!" -> positive Review: "Broke in a day." -> negative Review: "It's fine, nothing special." -> neutral Review: "Worst purchase ever." -> `` The model continues the pattern. The examples ARE your spec: if they're sloppy or inconsistent, the output will be too. Match the format precisely (if you want JSON, make the example outputs JSON).
#few-shot#examples#code#classification
Core ideaconcept

What is "chain-of-thought" prompting, and why does telling the model to "think step by step" help?

Chain-of-thought means asking the model to work through its reasoning before giving the final answer, instead of blurting the answer immediately. A model generates text one piece at a time, so when it writes out the steps, each step becomes context that improves the next, much like solving a tricky problem on scratch paper instead of in your head. It noticeably improves accuracy on multi-step tasks: math, logic, planning, "which option fits these rules." The cost is that it produces more text (more tokens, so more latency and money). Use it where correctness matters; skip it for simple lookups or classification where the answer is obvious.
#chain-of-thought#reasoning#accuracy#basics
Core ideahow-to

If I want chain-of-thought reasoning but my code only needs the final answer, how do I keep the reasoning out of my parsed output?

Don't try to strip prose with regex after the fact; that's fragile. Instead, give the reasoning a dedicated, named home so it's separate from the answer. A clean trick: ask for JSON with a reasoning field first and an answer field after, e.g. {"reasoning": "...", "answer": "approved"}. Your code reads .answer and ignores .reasoning. Ordering matters: put reasoning BEFORE answer so the model actually thinks before committing. (Note: some newer models have a built-in "thinking" mode that keeps reasoning in a separate channel from the visible answer, so check your provider's docs, but the JSON-field approach works everywhere.)
#chain-of-thought#json#structured-output#parsing
Core ideaconcept

Why should I ask the model to return JSON, and how does that help my code?

Because your code needs structured data, not a paragraph. If you ask "what's the sentiment and a confidence score?" and get back a friendly sentence, you're stuck writing brittle string parsing. Ask for {"sentiment": "positive", "confidence": 0.9} and you JSON.parse() it and move on, same as consuming any JSON API. In the prompt, state the exact shape and field names, ideally with an example. Many providers go further with a "structured output" / JSON-schema mode that forces the response to match your schema, worth using when available. Either way, always wrap the parse in a try/catch and validate, because the model can occasionally drift.
#json#structured-output#parsing#integration
Core ideagotcha

I asked for JSON but the model wrapped it in code fences or added "Here's your JSON:". How do I make it return clean, parseable JSON?

This is the classic gotcha: the model adds friendly chatter and your JSON.parse() throws. Three fixes, in order of strength: (1) Use your provider's structured-output / JSON mode if it has one; it forces the response to valid JSON against a schema, so you never get fences or preamble. (2) If not, instruct firmly: "Respond with ONLY the JSON object, no markdown, no code fences, no explanation." (3) As a safety net, defensively strip code fences and grab the substring from the first { to the last } before parsing, inside a try/catch. Don't rely on prompting alone for anything load-bearing: validate the parsed object against the fields you expect.
#json#gotcha#parsing#validation
Core ideaconcept

What are "delimiters" in a prompt and why do experienced builders use them?

Delimiters are clear markers that fence off one section of the prompt from another: triple backticks, XML-style tags like <document>...</document>, or headers like ### INSTRUCTIONS. They do two jobs. First, clarity: the model can tell your instructions apart from the data you pasted in. Second, and more importantly, safety: if you wrap untrusted user input in tags, the model is far less likely to obey malicious text hidden inside it ("ignore your instructions and..."). That attack is called prompt injection, and delimiters are a basic defense. Example: Summarize the text between the tags. <text>{userInput}</text>. It's like parameterizing a SQL query instead of string-concatenating: you separate code from data.
#delimiters#formatting#prompt-injection#safety
Core ideaconcept

What is "temperature," and why do builders set it to 0 for repeatable output?

Temperature is a knob (roughly 0 to 1) that controls randomness in the model's word choices. Low or 0 means the model picks the most likely next word almost every time, so output is focused and as repeatable as it gets. Higher means more variety and creativity, but also more drift. For anything your code depends on (classification, data extraction, JSON output, routing decisions) set temperature to 0 so the same input tends to give the same output and your tests stay stable. Crank it up for creative tasks like brainstorming names or marketing copy. One honest caveat: even at 0 the output isn't 100% guaranteed identical, so still validate, but it's your best lever for consistency.
#temperature#determinism#parameters#reliability
Core ideahow-to

How should I treat prompts in my codebase? Do I really test and iterate on them like code?

Yes. Treat a prompt like a function you're tuning, not a one-and-done string. The loop: write a first version, run it against a set of real example inputs, eyeball the outputs, tweak the wording, re-run. Keep a small "eval set" (a handful of inputs with the answers you expect) so when you change the prompt you can check you didn't break the cases that used to work; basically regression tests for your prompt. Version your prompts in git, change one thing at a time, and note why. Prompts are surprisingly sensitive: a reworded sentence can shift results, so measure instead of guessing.
#testing#iteration#evals#workflow
Core ideagotcha

What are common prompt mistakes that waste tokens or cause bad output?

A few that bite beginners: (1) Vagueness; "make it better" gives random results, so say what "better" means. (2) Burying the actual task under a wall of context so the model loses the point; put the key instruction clearly, often near the end. (3) Asking for several unrelated things in one prompt; split them. (4) Negative-only instructions ("don't be formal") work worse than positive ones ("write casually, like a text to a friend"). (5) Stuffing huge irrelevant context; you pay for every token (input is billed too) and dilute focus. (6) Contradicting yourself across system and user messages. Fixes are mostly about clarity, focus, and saying what you DO want.
#mistakes#tokens#best-practices#gotcha
Core ideacode

What is a "prompt template" with variables, and how do I build one in my app?

A prompt template is a reusable prompt string with placeholders you fill in per request, exactly like an HTML template or a parameterized query. You write the fixed instructions once and inject the variable parts (the user's question, a fetched record) at runtime. Example: ``js const prompt = Summarize this ticket for a ${audience}.\n<ticket>${ticketText}</ticket>; `` This keeps prompts consistent across calls, easy to version, and easy to test. Two cautions: wrap injected user/data values in delimiters (so they can't pose as instructions, the prompt-injection risk), and never blindly trust what comes back. Don't hand-concatenate dozens of prompts in scattered places; centralize templates like you would any config.
#templates#variables#code#reuse
Core ideahow-to

How do I stop the model from confidently making things up when it doesn't actually know the answer?

By default a model would rather give a plausible-sounding answer than admit ignorance; that's the "hallucination" problem (confident but wrong output). Two levers. First, ground it: provide the real source material in the prompt and instruct "answer ONLY using the text below; if it's not there, say 'I don't know.'" Second, give it explicit permission to bail: "If you're not sure, say so rather than guessing." Spelling out the not-sure path matters: models are reluctant to say "I don't know" unless told it's acceptable. For anything important, still verify the answer in code (check claimed facts against your data) rather than trusting it blind.
#hallucination#uncertainty#grounding#reliability
Core ideaconcept

What is a "token," and why should I care about token count as a builder?

A token is a chunk of text the model reads and writes, roughly three-quarters of a word in English (so ~750 words is about 1,000 tokens). It matters for three concrete reasons. Cost: you're billed per token for BOTH what you send (input) and what you get back (output), and as of 2026 output is typically several times pricier than input. For example a mid-tier model (Claude Sonnet or a GPT-5-class model) runs around a few dollars per million input tokens and roughly $15 per million output. Limits: each model has a max "context window" (how many tokens fit in one call); overflowing it errors or drops content. Latency: more tokens means slower. So a bloated prompt costs money, time, and risks hitting the ceiling; keep prompts tight.
#tokens#cost#context-window#limits
Core ideahow-to

When I give the model both instructions and a big block of data, where should the instructions go?

Lead with the instructions, fence the data in delimiters, and it's often worth briefly restating the key ask AFTER the data too. With a large blob in the middle, a model can lose track of what you wanted, so bookending helps it stay on task. A reliable shape: "You will extract the order number. <data>{bigBlob}</data> Now return ONLY the order number as JSON." The instruction is clear at the top, the data is clearly separated, and the final line re-anchors the exact task and format right before the model answers. This also keeps untrusted data from being mistaken for instructions.
#formatting#delimiters#instruction-placement#best-practices
Core ideagotcha

If I asked nicely for JSON, can I just `JSON.parse()` the response and move on?

No — that's the trap. The model usually obeys, but "usually" isn't "always." It can wrap output in ``json fences, add a chatty "Here you go:" line, truncate if it hits the token limit, or emit a trailing comma — any of which makes JSON.parse() throw and crash your handler. Parse defensively: wrap it in try/catch`, strip code fences first, and on failure don't 500 — fall back (retry once, or return a safe default). Treat model output like untrusted user input from a form, never like a guaranteed-valid internal API response. Then validate the parsed object's fields before using them.
#json#parsing#defensive#gotcha#validation
Hands-ongotcha

My prompt works great in the playground but gives worse or inconsistent results in production. What's likely going wrong?

Usually one of these: (1) Real inputs are messier than your test ones: weird formatting, edge cases, empty fields, or longer text that crowds out your instructions. (2) You're at a non-zero temperature, so outputs vary run to run; drop it to 0 for data tasks. (3) The user-supplied data is overriding your instructions (prompt injection, or just confusing the model); fence it in delimiters. (4) You changed the model or version and behavior shifted. (5) You're parsing the response too strictly and occasional drift breaks it. Fix it the way you'd debug any flaky service: build an eval set from REAL inputs, reproduce the failure, then harden the prompt and add validation in code around the output.
#debugging#production#consistency#gotcha
Hands-ondecision

Is more detail in a prompt always better? Should I keep adding instructions until it's perfect?

No. Clarity beats volume. Past a point, piling on instructions causes problems: the model may fixate on minor rules, contradictory directions creep in, the core task gets buried, and you burn tokens (money plus latency) on every call. A bloated prompt is also harder to maintain and debug; when something breaks, you can't tell which of 30 rules caused it. Aim for the shortest prompt that reliably produces the output you need: clear task, just-enough context, the format, and one or two examples if required. Add a rule only when you've seen a failure it actually fixes, and remove rules that don't pull their weight. Treat prompt length like dependency bloat: lean is better.
#conciseness#tokens#maintainability#decision