NLP 101

Posts

Showing posts from January, 2026

4. From Predicting Words to Making Decisions

January 28, 2026

So far, we’ve focused on language models as generators. They read text. They predict what comes next. They assign probabilities to words. But for a long time, most NLP systems didn’t generate language at all . They made decisions. Is this email spam or not? Is this review positive or negative? Does this document belong to topic A or topic B? This shift from predicting words to predicting labels is where early NLP systems spent most of their time. And it’s where ideas like logistic regression , cross-entropy loss , and gradient descent became foundational. From Sequences to Labels In language modeling, the output is a distribution over possible next tokens. In classification, the output is simpler: A class label Or a probability over a small number of classes Instead of asking: “What word comes next?” We ask: “Which category does this input belong to?” At first glance, this sounds easier. Fewer outputs. Clear answers. In practice, it introduces a differen...

3. How Do We Know a Language Model Is Any Good?

January 07, 2026

So far in this series, we’ve talked about how text becomes tokens and why early language models struggled with data sparsity. Now comes the obvious next question: How do we actually know whether a language model is good or bad? This turns out to be much harder than it sounds. Unlike image classification, where a model either labels an image correctly or it doesn’t, language models don’t usually have one “right” answer. For most inputs, there are many perfectly reasonable continuations. So evaluation in NLP is less about absolute correctness and more about how surprised the model is by real language and how useful it is in practice . Intrinsic vs Extrinsic Evaluation Broadly, there are two ways to evaluate language models. Intrinsic evaluation measures the quality of the language model directly . You ask: How well does the model predict real text? Extrinsic evaluation measures performance on a downstream task. You ask: Does this model help me do something useful, like classifica...