Classic ML in Plain English (the bits you should still know)

Classic ML in Plain English (the bits you should still know) — explained simply for developers.

Learn this interactively →
Basicsconcept

Everyone's talking about LLMs and ChatGPT. So why does 'classic ML' still exist — isn't a big language model just better at everything?

Think of it like choosing a database. You wouldn't reach for a graph database to store a simple users table. 'Classic ML' (machine learning) means smaller models trained to do one narrow job — flag spam, predict a price, score fraud — usually from your own table of past examples. They're cheap (fractions of a cent), fast (milliseconds, no network call to a giant model), private (run on your own server), and predictable. An LLM (large language model — the tech behind ChatGPT) is a giant general text engine: amazing for open-ended language, but overkill and pricey for 'is this row fraud, yes or no'. Different tools, different jobs.
#classic-ml#llm-vs-ml#fundamentals
Basicsconcept

I keep seeing 'classification', 'regression', and 'clustering'. In plain English, what's the difference?

They're the three everyday shapes of ML problems. Classification = sorting into buckets: 'spam or not spam', 'fraud or legit'. The output is a label. Regression = predicting a number: 'what price will this house sell for', 'how many orders tomorrow'. The output is a quantity. Clustering = 'find natural groups in my data without me telling you the groups' — like auto-grouping customers into segments you didn't define. The first two are 'supervised' (you trained on past examples that already had the right answer); clustering is 'unsupervised' (no answers given, it finds structure on its own). Most business ML you'll build is classification or regression.
#classification#regression#clustering
Basicsconcept

What is 'training data', and what are 'labels'? I'm a web dev, give me something concrete.

Training data is just a big table of past examples — like a SQL export of rows you already have. Each row's input columns are what the model gets to see (sender, subject, link count for an email). The 'label' is the known correct answer for that row — the column you want to predict later ('spam' or 'not spam', tagged by humans or by history). Training means: show the model thousands of rows WITH their labels so it learns the pattern connecting inputs to the answer. Later you feed a new, unlabeled row and it predicts the label. No labels, no supervised learning — and gathering those labels is often the real work.
#training-data#labels#supervised-learning
Basicsconcept

People say 'features' a lot. What is a feature?

A feature is just one input column the model uses to make its decision — nothing fancier than a field on a record. For predicting house price, features might be square footage, number of bedrooms, zip code, year built. For spam, features are sender domain, number of links, ALL-CAPS ratio. The model learns how each feature nudges the answer. 'Feature engineering' is the craft of turning raw data into useful columns — e.g. deriving 'account_age_in_days' from a signup timestamp because the raw timestamp is useless on its own. Good features matter more than a fancy model; garbage columns in, garbage predictions out.
#features#feature-engineering#inputs
Basicsconcept

What does it actually mean to 'train a model'? Is it like writing code?

Not quite — you don't write the rules, the training process discovers them for you. You pick a model type and feed it the labeled training table. It repeatedly guesses, checks how wrong it was against the known answers, and nudges its internal numbers to be less wrong next time — over and over until the guesses stop improving. The output is a saved file (the 'trained model', often just a blob you load like a config). Compare it to caching: you do the expensive work once (training), then serving a prediction is fast and cheap. Your code's job is mostly to prep the data, call train, and later call predict.
#training#model#fundamentals
Core ideacode

Walk me through a tiny example: how would I detect spam with classic ML instead of an LLM?

Roughly: gather a table of past emails, each labeled spam or not_spam. Build feature columns from each — link count, sender reputation, caps ratio, suspicious words. Then you call train, handing it those features plus the labels. The trained model is a small file. At serve time it's basically a function call: you pass the features of a new email to predict, and it hands back a label like 'spam' with a confidence number (say 0.93). That runs in a flash on your own box, costs nothing per call, and needs no external API. An LLM could also classify spam, but you'd pay per call, add network latency, and send email content to a third party — wasteful for a yes/no on millions of messages.
#spam#classification#code-sketch
Core ideaconcept

What's 'overfitting'? It comes up constantly and I don't get it.

Overfitting is when a model memorizes the training examples instead of learning the general pattern — like a student who memorizes the exact answer key but flunks the real exam because the questions are worded differently. Such a model scores great on data it has already seen and badly on new data, which is the only thing that matters in production. The web-dev tell: amazing numbers in your test run, disappointing results live. You catch it by always checking the model on a held-out set it never trained on (a 'test set' you split off, like keeping a separate staging dataset). If the score on training data is much higher than on the held-out set, it's overfitting.
#overfitting#generalization#evaluation
Core ideagotcha

How do I even know if a model is any good? Why can't I just trust 'it's 99% accurate'?

Accuracy means 'fraction of predictions it got right' — and it lies hard when one outcome is rare. Imagine a disease that affects 1 in 1000 people. A model that just always says 'healthy' is 99.9% accurate while catching zero sick patients — useless. Same trap with fraud (most transactions are legit). So you also look at: precision ('of the ones it flagged as fraud, how many really were?') and recall ('of all the real fraud, how many did it actually catch?'). A confusion matrix — a little table tallying right and wrong predictions for each outcome — shows the full picture. Always ask 'accurate at what, and on which outcome?'
#accuracy#precision-recall#imbalanced-data#evaluation
Core ideahow-to

I split data into 'train' and 'test'. Why, and what's the rule I must never break?

You hold back a slice of your labeled rows (say 20%) as a 'test set' the model never sees during training, then check its predictions against the known answers — that estimates real-world performance, like running against staging before prod. The unbreakable rule: never let test data leak into training, directly or sneakily. 'Leakage' includes computing averages over the whole dataset before splitting it, or including a feature that secretly encodes the answer (e.g. a 'refund_issued' column when predicting fraud). Leakage gives you gorgeous fake scores that collapse in production. If results look too good to be true, hunt for leakage first.
#train-test-split#data-leakage#evaluation
Core ideadecision

When should I reach for a small classic model versus just calling an LLM API?

Pick a classic model when the task is a well-defined prediction over structured, table-shaped data you already have — fraud scoring, churn prediction, price estimates, spam — especially at high volume where per-call LLM cost and latency hurt, or where data can't leave your servers. Pick an LLM when the input is messy natural language or images and the task is open-ended: summarizing tickets, answering questions, pulling fields out of free text, chat. Rough rule: if you can phrase it as 'predict this column from those columns' and you have a labeled history, classic ML is cheaper and steadier. If it needs genuine language understanding or reasoning, reach for the LLM.
#decision#llm-vs-ml#cost#architecture
Core ideadecision

I don't want to learn the math or train models from scratch. Are there cloud ML APIs I can just call like any REST endpoint?

Yes — and that's often the right move. Cloud ML APIs wrap ready-made models behind a normal authenticated HTTP endpoint: you POST input, get JSON back, pay per call, no training needed. As of 2026, examples include AWS (Rekognition for images, Comprehend for text), Google Cloud (Vision, Natural Language, Document AI), and Azure AI services (Vision, Language, Document Intelligence). Great for common tasks — OCR, sentiment, image labels, translation. Note that some narrow services get retired (AWS's Amazon Fraud Detector stopped taking new customers and now points you at SageMaker), so check it's still offered. The trade-off versus your own model: ongoing per-call cost, vendor lock-in, latency, and sending data off-box — but you ship in an afternoon.
#cloud-ml#aws#gcp#azure#managed-api
Core ideacode

If I do want to train something myself in code, what does the workflow look like — roughly?

In Python the go-to library is scikit-learn (as of 2026, still the default for table-shaped classic ML). The shape is tiny and reads like plain steps: put your feature columns in one variable and the labels in another, then split them into a training chunk and a test chunk (say 80/20). Create a model (a RandomForestClassifier is a fine starting point), call fit on the training chunk to teach it, then call score on the test chunk to get an honest accuracy on rows it never saw. From then on, call predict on new rows. Finally you save the model to a file (joblib) and load it in your service. The hard part isn't these few lines — it's getting clean labeled data and trustworthy features.
#scikit-learn#code-sketch#workflow#python
Core ideaconcept

What's a 'model' actually shaped like when I deploy it — do I run a server like with an LLM?

Usually no heavy server. A trained classic model is typically a small file (kilobytes to a few megabytes) holding the learned numbers. You load it once at startup — like reading a config or warming a cache — and call predict in-process. So it can live right inside your existing Node or Python backend: no GPU, no separate model service, responses in single-digit milliseconds. That's the big contrast with LLMs, which are many gigabytes and almost always called over the network as a hosted API. For classic ML, 'deploy' often just means ship the file with your app and load it. Retraining periodically (say nightly) is a separate batch job.
#deployment#inference#model-file#serving
Hands-ongotcha

What's a costly lesson people learn shipping classic ML that I should know upfront?

Models silently rot — it's called 'drift'. You train on last year's data, deploy, and it works; then the world shifts (new fraud tactics, a pricing change, a new product) and predictions quietly get worse with no error thrown — like a cache that serves stale data but never expires. Nothing shows up in your logs; accuracy just decays. So you have to monitor live predictions against what actually happened later, alert when quality drops, and retrain on fresh data regularly. The mistake is treating a model as 'ship once, done'. Treat it like a dependency that needs upkeep, with a dashboard watching its real-world hit rate.
#drift#monitoring#retraining#production