Posts

Showing posts from February, 2026

6. What Is the Model Actually Looking At?

In the last blog, we talked about how probabilities turn into decisions. Thresholds. Trade-offs. Metrics. But there’s something deeper hiding underneath all of that. Before we evaluate decisions, before we argue about precision vs recall, before we tune thresholds, We need to ask:  What is the model actually seeing? Because models don’t really see text.  They see numbers.  And the way we turn language into numbers determines what kinds of mistakes are even possible. Multiclass Evaluation Gets Messy Fast Binary classification is easy. One positive class. One negative class. A clean 2×2 confusion matrix. But multiclass changes the geometry. Instead of: True positive False positive True negative False negative We now have a k × k confusion matrix. In a native multiclass confusion matrix, classes are not literally re-labeled positive/negative; that’s an interpretation used to compute class-wise metrics. Now, apart from asking:  How often were...

5. Turning Probabilities into Decisions

In the last post, we talked about how NLP systems moved from generating language to making decisions. We saw how classifiers produce probabilities. How loss functions tell us how wrong we are. How optimization nudges models toward better behavior. But there’s a step in this pipeline that often gets looked over. At some point, a probability becomes a decision. And that step is where most failures happen. Probabilities Are Not Decisions A classifier rarely outputs a label directly. It outputs something like:  0.73 That number doesn’t mean “spam.” It means:  “Given the model, the data, and the assumptions baked into training, this input looks spam-like with probability 0.73.” To turn that into a decision, we introduce a threshold . If probability ≥ threshold -> positive class Otherwise -> negative class The most common threshold is 0.5. Because it’s convenient. And that convenience hides trade-offs. Thresholds Encode Values Changing the threshold changes behavior. Lower the...