When Not to Use Machine Learning

Updated May 2026

You should not use machine learning when the rules are known and stable (use traditional programming instead), when you have too little data for the model to learn meaningful patterns, when the cost of errors is unacceptable and the model cannot be made reliable enough, when a simpler statistical method solves the problem adequately, or when the problem requires causal understanding rather than prediction.

When the Rules Are Known

If you can write down the rules that govern the decision, traditional programming is faster, cheaper, more reliable, and easier to maintain than machine learning. A tax calculator follows explicit formulas defined by law. An inventory reorder system triggers when stock drops below a threshold. A password validator checks length, character types, and common patterns. None of these need ML because the rules are complete, stable, and unambiguous.

ML adds value when the rules are too complex to specify (recognizing faces), too numerous to enumerate (all the ways spam emails differ from legitimate ones), or change over time (fraud tactics evolving monthly). If none of these conditions apply, traditional code is the right choice. Adding ML to a problem that does not need it increases complexity, reduces reliability, and introduces failure modes that simple code avoids.

What if the rules are mostly known but have edge cases?

Start with rules and handle the edge cases explicitly. If the edge cases are too numerous or varied, consider a hybrid approach: use rules for the core logic and ML only for the ambiguous cases. This gives you the reliability of rules where they work and the flexibility of ML where they do not.

When You Have Too Little Data

ML algorithms learn from data. If you do not have enough data, they cannot learn meaningful patterns and will either underfit (model too simple to capture what little signal exists) or overfit (memorize the few examples without generalizing). The minimum depends on the problem complexity, but rough guidelines exist.

For tabular classification with a few features, you need at least 100-500 examples per class. For regression with 10-20 features, you need at least 1,000-5,000 samples. For image classification with transfer learning, you need at least 100-1,000 images per class. For training a language model from scratch, you need millions to billions of text tokens. For fine-tuning an existing language model, you need hundreds to thousands of examples.

If you have fewer than 50 data points, ML is almost never appropriate. Simple statistics (means, medians, correlations, basic hypothesis tests) will tell you more than any ML model. At 50-500 data points, use simple models (logistic regression, small decision trees) with careful cross-validation. At 500+ data points, the full ML toolkit becomes viable.

Can you use ML with small data if you augment it?

Data augmentation helps for images (rotations, flips, color changes) and text (paraphrasing, back-translation) but does not create genuinely new information. It reduces overfitting by teaching invariance to transformations. Transfer learning, which starts from a model pre-trained on a large dataset, is the most effective approach for small-data ML.

When the Cost of Errors Is Unacceptable

Every ML model has an error rate. For many applications, that error rate is acceptable: a spam filter that occasionally misclassifies a marketing email is not dangerous. But for safety-critical applications, even a 1% error rate may be unacceptable.

Air traffic control cannot tolerate false negatives (missing a collision risk). Surgical robots cannot tolerate prediction errors that cause physical harm. Nuclear plant safety systems must work with 100% reliability under all conditions. These domains use formal verification, physics-based models, and redundant safety systems, not statistical learning from data.

This does not mean ML has no role in safety-critical domains, but it means ML should assist human experts rather than replace them. A diagnostic model that flags potential cancers for radiologist review adds value even if it is imperfect. A diagnostic model that autonomously decides treatment without human oversight is dangerous if its error rate, however low, affects real patients.

When a Simpler Method Works

If a linear regression achieves R-squared of 0.92 and a neural network achieves 0.93, the linear regression is almost certainly the better production choice. It is faster, cheaper, more interpretable, easier to maintain, less likely to fail in unexpected ways, and does not require GPU infrastructure. The 1% accuracy improvement rarely justifies the 10x increase in operational complexity.

Similarly, if a simple business rule achieves 85% accuracy and an ML model achieves 88%, consider whether the 3-point improvement justifies the cost of building, deploying, and maintaining an ML system. ML has real ongoing costs: monitoring, retraining, infrastructure, and the engineering time to debug model failures. Simple rules have almost zero maintenance cost.

The tendency to use ML when simpler approaches work is called "ML solutionism." It is driven by the perception that ML is sophisticated and modern while simple approaches are unsophisticated. But the most effective engineers choose the simplest tool that solves the problem. A SQL query that answers the business question is superior to an ML model that answers the same question with a similar accuracy and 100x more complexity.

When You Need Causal Understanding

ML models find correlations, not causes. A model might discover that patients who receive a certain drug have higher mortality rates. But this could mean the drug is harmful, or it could mean the drug is given only to the sickest patients. The model cannot distinguish these explanations because it sees correlation, not the causal mechanism.

If your question is "what will happen if we change X?" rather than "what will happen next?", you need causal inference methods, not standard ML. Randomized controlled trials, instrumental variables, difference-in-differences, and structural equation models are designed to estimate causal effects. Using ML for causal questions without these frameworks produces answers that are confidently wrong.

There is a growing field of causal ML that combines ML with causal inference frameworks, but it requires careful study design, not just feeding data into an algorithm. Standard supervised learning optimizes prediction accuracy, not causal identification.

When Interpretability Is Non-Negotiable

Some domains require that every decision be fully explainable. A bank must explain why a loan was denied. A medical diagnosis must be supported by specific clinical evidence. A legal decision must reference applicable rules and precedents.

While interpretability tools (SHAP, LIME) can approximate explanations for complex models, they are post-hoc rationalizations, not true explanations. They tell you which features the model relied on, but they do not tell you why the model learned to rely on them. For domains where explanation is legally or ethically required, consider whether a simple, inherently interpretable model (logistic regression, small decision tree, scoring rules) provides adequate performance. If it does, use it. The guaranteed interpretability is worth more than the marginal accuracy improvement from a black-box model.

Key Takeaway

Do not use ML when the rules are known, data is insufficient, errors are intolerable, simpler methods work adequately, or causal understanding is needed. The best ML practitioners know when not to use ML, because every problem solved with unnecessary complexity is a maintenance burden, a potential failure point, and a misuse of engineering resources.

When the Rules Are Known

When You Have Too Little Data

When the Cost of Errors Is Unacceptable

When a Simpler Method Works

When You Need Causal Understanding

When Interpretability Is Non-Negotiable

Related Questions

What Is Machine Learning?

Common Machine Learning Mistakes

Ethics in Machine Learning

Interpreting Results

Critical Thinking in Science