Navigation: <-- Start Simple | Part Index | Main Index | Choosing and Aligning Metrics -->
Baselines and the Good-Enough Bar
Requires:
Motivation: In ๐ Start Simple you saw that complexity must be earned: step up only when simpler models fall short. But fall short relative to what? Model development relies on two reference points that serve different purposes. First, the baseline floor: what a model that learned nothing at all would achieve. Second, there is a good-enough bar: the minimum performance the application actually requires. The first provides a development signal. The second is a deployment criterion.
In this nugget, you'll learn what a baseline is, how to construct one for both classification and regression, and how to use it to check whether a model has genuinely learned something. You'll also see that domain goals determine when a model is good enough.
Table of Contents
Always Beat the Dummy Baseline First
A dummy baseline is the performance achievable by a model that uses no learned relationships between features and target. For classification and regression the standard choices are majority class and mean, respectively:

- The dummy classifier predicts the majority class for every input. Its accuracy equals the proportion of the most frequent class in the training set.
- The dummy regressor predicts the mean of the training targets for every input. Its mean squared error (=MSE) equals the variance of the training target.
There are variations of these baselines, like using a median or adjusting to a particular domain (like using seasonal means). But in general, that's it. Simple.
Implementation-wise sklearn provides DummyClassifier and DummyRegressor. Fit them on the training data, score them on the held-out data, report them.
Tip: Always include the baseline score when reporting model results.
A relative performance difference as in "Our model achieves 88% accuracy, compared to 85% from the majority-class baseline" tells you what the model actually learned.
What a Baseline Reveals
A model that barely beats the dummy baseline is usually not a candidate for hyperparameter tuning. It rather calls for investigation. Potential causes for a weak model relative to the baseline are:
- The features do not carry signal about the target. More features or better data may be needed.
- Data quality problems. The target or features may be measured inconsistently or contain systematic errors. Noisy labels give a hard limit to what patterns a model can learn, regardless of complexity.
- Issues with problem definition. The prediction task may need to be adjusted. Perhaps the target variable is not well-defined for what the system actually needs to do.
Checking the baseline is therefore a cheap diagnostic step at the beginning of the modeling process. It also aligns with ๐ Start Simple.
Setting the Good-Enough Bar
In addition to the "floor" baseline, a model often also needs to achieve a minimum performance to become a candidate for deployment. It needs to be good enough. Here, good enough is a domain and problem-dependent judgment, not a universal threshold:
- A fraud detection model that catches 45% of a new type of fraudulent transactions may represent enormous business value, even though it misses the majority.
- A medical risk-scoring model that explains only 25% of the variance of the target variable may be insufficient to justify treatment decisions.
- A recommendation engine that improves click-through rate by 3% above the popularity baseline may generate significant revenue at scale.
At best, domain requirements appear as explicit thresholds: "catch at least 80% of fraud cases" or "stay within ยฑ5% of demand." If you are given such requirements, you should translate them into a minimum acceptable value on a specific metric. For example, a minimum recall value of 0.8 for the fraud domain requirement.
Practical note: In many cases, domain requirements might not yet be explicit. In these cases, they must be negotiated with stakeholders as part of business understanding. When would a model be good enough to do real work?
Domain requirements inform which metrics to use. Metrics must be chosen before training, not after: If the domain requirement is recall-based, optimizing for accuracy may yield a model that clears the baseline but fails the requirement.
Taken together, the baseline floor and the good-enough bar anchor modeling decisions: One establishes that the model has learned something. The other defines what the application actually needs, which we'll explore further in the next nugget ๐ Choosing and Aligning Metrics.
TODO: link business nugget (Part IX).
Summary
- The dummy baseline predicts the majority class (classification) or the training mean (regression). It is a development gate: any learned model must beat it before investing further effort.
- A model that barely beats the baseline signals a problem with the data, the features, or the problem definition. It does not signal a need for a more complex model.
- "Good enough" is defined by domain requirements like a minimum recall or a maximum error rate. These requirements must be translated into data-science metrics and corresponding threshold before training begins.
As always: Happy learning, happy life! ๐ซถ
Navigation: <-- Start Simple | Part Index | Main Index | Choosing and Aligning Metrics -->
Script v1.4 (2026-06-10) ยท FGN