Classification vs Regression in Machine Learning
What Classification Does
Classification assigns input data to one of several predefined categories. The output is a discrete label, often with an associated probability. When a spam filter classifies an email, it outputs "spam" or "not spam," along with a confidence score like 0.97. When a medical imaging model classifies a tumor, it outputs "malignant" or "benign" with a probability for each.
Binary classification has exactly two categories. Spam detection, fraud detection, disease screening, and sentiment analysis (positive/negative) are binary problems. Most binary classifiers output a probability between 0 and 1, and a threshold (typically 0.5) converts that probability into a class label.
Multi-class classification has three or more categories. Handwritten digit recognition (10 classes: 0-9), plant species identification, or document topic classification are multi-class problems. The model outputs a probability distribution across all classes, and the class with the highest probability becomes the prediction.
Multi-label classification allows each input to belong to multiple categories simultaneously. A news article might be tagged as both "politics" and "economics." A movie might be labeled as both "comedy" and "romance." Each label is predicted independently.
The key algorithms for classification include logistic regression (despite its name), decision trees, random forests, support vector machines, k-nearest neighbors, naive Bayes, and neural networks. For tabular data, gradient-boosted trees (XGBoost, LightGBM) dominate competitions and production systems. For images and text, neural networks are standard.
What Regression Does
Regression predicts a continuous numerical value. The output is a number on a continuous scale, not a category. Predicting that a house will sell for $427,500 is regression. Predicting that tomorrow's temperature will be 23.4 degrees Celsius is regression. Estimating that a patient will recover in 12.7 days is regression.
The simplest regression algorithm, linear regression, fits a straight line through the data. For a single feature, the model learns the equation y = mx + b where m is the slope and b is the intercept. With multiple features, it becomes y = w1*x1 + w2*x2 + ... + wn*xn + b, fitting a hyperplane through multi-dimensional space. Linear regression is interpretable, fast, and often the first algorithm to try on any regression problem.
When the relationship between features and target is nonlinear, algorithms like polynomial regression, decision tree regression, random forest regression, gradient boosting regression, and neural networks capture curves and interactions that linear models cannot. A random forest can model the fact that location matters more than square footage for luxury homes but matters less for rural properties, a nonlinear interaction that linear regression misses.
Regression outputs can be transformed into classification through thresholding. If you predict a patient's blood sugar as 185 mg/dL and the diagnostic threshold for diabetes is 126 mg/dL, the continuous prediction becomes a binary classification. This flexibility means regression is sometimes used even when the ultimate decision is categorical.
Key Differences in Practice
Loss functions differ. Classification typically uses cross-entropy loss, which measures how well the predicted probability distribution matches the actual class labels. Regression typically uses mean squared error (MSE), which measures the average squared difference between predictions and actual values. The choice of loss function drives the entire optimization process, so this difference is fundamental.
Evaluation metrics differ. Classification is evaluated with accuracy, precision, recall, F1 score, and AUC-ROC. Regression is evaluated with RMSE, MAE, R-squared, and MAPE. Using the wrong metrics for the task type produces meaningless results. You cannot compute accuracy for a regression model, and RMSE is nonsensical for classification.
Output interpretation differs. A classification model's output of 0.85 for class "positive" means the model assigns 85% probability to the positive class. A regression model's output of 85 means the predicted numerical value is 85 in whatever units the target uses. Both involve numbers, but they mean completely different things.
Error analysis differs. In classification, errors are categorical: false positives (predicted positive, actually negative) and false negatives (predicted negative, actually positive). The cost of each error type is often asymmetric, a false negative in cancer screening is far worse than a false positive. In regression, errors are numerical: the prediction was off by some amount. The analysis focuses on whether errors are random and symmetric, or whether there are systematic biases.
Algorithms That Do Both
Many algorithms have both classification and regression variants. Decision trees can split on features to predict either a category (classification tree) or a number (regression tree). Random forests, gradient boosting, k-nearest neighbors, support vector machines, and neural networks all come in both classification and regression versions. The underlying structure is similar, but the loss function, output layer, and training objective change.
Scikit-learn, the most popular ML library in Python, makes this explicit with parallel naming: RandomForestClassifier and RandomForestRegressor, GradientBoostingClassifier and GradientBoostingRegressor. The interfaces are identical, you just instantiate the right variant for your task.
How to Decide Which One You Need
Look at your target variable. If it is a category (even if encoded as numbers like 0/1), use classification. If it is a continuous measurement, use regression. The decision is about the target, not the features. You can have continuous features in a classification problem and categorical features in a regression problem.
Some situations are ambiguous. Customer satisfaction ratings on a 1-5 scale could be treated as regression (predicting the numerical score) or ordinal classification (predicting the category while respecting the ordering). Student grades (A/B/C/D/F) are ordinal categories that could go either way. In these cases, try both approaches and compare results on your evaluation metric of choice.
Classification predicts discrete categories, regression predicts continuous numbers. They use different loss functions, different metrics, and different output interpretations. The choice is determined by whether your target variable is categorical or numerical. Many algorithms have variants for both tasks.