NLP 101

Posts

5. Turning Probabilities into Decisions

February 04, 2026

In the last post, we talked about how NLP systems moved from generating language to making decisions. We saw how classifiers produce probabilities. How loss functions tell us how wrong we are. How optimization nudges models toward better behavior. But there’s a step in this pipeline that often gets looked over. At some point, a probability becomes a decision. And that step is where most failures happen. Probabilities Are Not Decisions A classifier rarely outputs a label directly. It outputs something like: 0.73 That number doesn’t mean “spam.” It means: “Given the model, the data, and the assumptions baked into training, this input looks spam-like with probability 0.73.” To turn that into a decision, we introduce a threshold . If probability ≥ threshold -> positive class Otherwise -> negative class The most common threshold is 0.5. Because it’s convenient. And that convenience hides trade-offs. Thresholds Encode Values Changing the threshold changes behavior. Lower the...

4. From Predicting Words to Making Decisions

January 28, 2026

So far, we’ve focused on language models as generators. They read text. They predict what comes next. They assign probabilities to words. But for a long time, most NLP systems didn’t generate language at all . They made decisions. Is this email spam or not? Is this review positive or negative? Does this document belong to topic A or topic B? This shift from predicting words to predicting labels is where early NLP systems spent most of their time. And it’s where ideas like logistic regression , cross-entropy loss , and gradient descent became foundational. From Sequences to Labels In language modeling, the output is a distribution over possible next tokens. In classification, the output is simpler: A class label Or a probability over a small number of classes Instead of asking: “What word comes next?” We ask: “Which category does this input belong to?” At first glance, this sounds easier. Fewer outputs. Clear answers. In practice, it introduces a differen...

3. How Do We Know a Language Model Is Any Good?

January 07, 2026

So far in this series, we’ve talked about how text becomes tokens and why early language models struggled with data sparsity. Now comes the obvious next question: How do we actually know whether a language model is good or bad? This turns out to be much harder than it sounds. Unlike image classification, where a model either labels an image correctly or it doesn’t, language models don’t usually have one “right” answer. For most inputs, there are many perfectly reasonable continuations. So evaluation in NLP is less about absolute correctness and more about how surprised the model is by real language and how useful it is in practice . Intrinsic vs Extrinsic Evaluation Broadly, there are two ways to evaluate language models. Intrinsic evaluation measures the quality of the language model directly . You ask: How well does the model predict real text? Extrinsic evaluation measures performance on a downstream task. You ask: Does this model help me do something useful, like classifica...

Search This Blog

NLP 101

Posts

6. What Is the Model Actually Looking At?

5. Turning Probabilities into Decisions

4. From Predicting Words to Making Decisions

3. How Do We Know a Language Model Is Any Good?