4. From Predicting Words to Making Decisions
So far, we’ve focused on language models as generators. They read text. They predict what comes next. They assign probabilities to words. But for a long time, most NLP systems didn’t generate language at all . They made decisions. Is this email spam or not? Is this review positive or negative? Does this document belong to topic A or topic B? This shift from predicting words to predicting labels is where early NLP systems spent most of their time. And it’s where ideas like logistic regression , cross-entropy loss , and gradient descent became foundational. From Sequences to Labels In language modeling, the output is a distribution over possible next tokens. In classification, the output is simpler: A class label Or a probability over a small number of classes Instead of asking: “What word comes next?” We ask: “Which category does this input belong to?” At first glance, this sounds easier. Fewer outputs. Clear answers. In practice, it introduces a differen...