How to Tune Hyperparameters

Updated May 2026
Hyperparameter tuning is the process of finding the optimal settings for a machine learning algorithm, settings that control how the algorithm learns rather than what it learns. The number of trees in a random forest, the learning rate in gradient boosting, the regularization strength in logistic regression are all hyperparameters that significantly affect model performance. Systematic tuning with cross-validation typically improves a model by 2-10% over default settings.

Parameters and hyperparameters are different things. Parameters are learned from the data during training: the weights in a linear model, the split points in a decision tree. Hyperparameters are set before training begins and control the learning process itself: how many trees to build, how deep each tree can grow, how aggressively to regularize. You cannot learn hyperparameters from training data because they define the training process.

Step 1: Identify Which Hyperparameters Matter

Most algorithms have 5-20 hyperparameters, but only 2-4 have a significant impact on performance. Focus your tuning budget on these high-impact parameters.

Random Forest: n_estimators (number of trees, 100-500 is usually sufficient), max_features (features per split, sqrt for classification, n/3 for regression), and min_samples_leaf (minimum samples at leaf nodes, 1-10).

Gradient Boosting (XGBoost/LightGBM): learning_rate (0.01-0.3, lower is better with more trees), n_estimators (100-3000, inversely related to learning rate), max_depth (3-10, controls tree complexity), and subsample (0.5-1.0, fraction of data per tree).

Logistic Regression: C (regularization strength, 0.001-1000, logarithmic scale) and penalty (L1, L2, or elastic net).

SVM: C (regularization, 0.01-100), kernel (rbf, linear, poly), and gamma (kernel width for RBF, 0.001-10).

Neural Networks: learning_rate (0.0001-0.01), batch_size (16-256), number of layers and units, dropout rate (0.1-0.5), and optimizer choice (Adam is the default).

Step 2: Define the Search Space

For each hyperparameter, define a reasonable range. Use the algorithm defaults as a starting point and expand outward. Parameters that vary over orders of magnitude (C in SVM, learning rate) should be searched on a logarithmic scale: [0.001, 0.01, 0.1, 1, 10, 100] rather than [0, 20, 40, 60, 80, 100].

Avoid searching too broadly. A learning rate of 100 will never work for gradient boosting. An SVM with C=0.00001 will always underfit. Tight, informed ranges produce better results with less compute than wide, uninformed ranges.

Consider interdependencies. In gradient boosting, learning_rate and n_estimators are inversely related: a lower learning rate needs more trees. Searching them independently wastes budget on combinations like (learning_rate=0.3, n_estimators=3000) that are clearly suboptimal.

Step 3: Choose a Search Strategy

Grid Search tries every combination of hyperparameter values from predefined lists. With 3 values for each of 4 hyperparameters, grid search evaluates 81 combinations. It is exhaustive within the grid but scales exponentially with the number of hyperparameters and misses values between grid points. Best for fine-tuning 1-2 hyperparameters around a known good region.

Random Search samples combinations randomly from the defined ranges. Bergstra and Bengio (2012) showed that random search is more efficient than grid search because it covers the parameter space more evenly when some hyperparameters matter more than others. With 60 random trials, there is a 95% chance of finding a value within the top 5% of the true optimum. Best for initial broad exploration.

Bayesian Optimization uses past evaluation results to model the relationship between hyperparameters and performance, then intelligently chooses the next combination to try. Libraries like Optuna, Hyperopt, and scikit-optimize implement this approach. Bayesian optimization is more sample-efficient than random search, finding better configurations in fewer trials. Best for expensive models where each evaluation takes minutes to hours.

Successive Halving and Hyperband allocate more resources to promising configurations. Start by training all configurations for a small number of iterations, discard the bottom half, double the iterations for survivors, and repeat. This finds good configurations much faster than fully training every candidate.

Step 4: Evaluate with Cross-Validation

Every hyperparameter configuration must be evaluated using cross-validation, not a single train-test split. A single split introduces randomness that can make a mediocre configuration look good. 5-fold cross-validation provides stable estimates.

In scikit-learn, GridSearchCV and RandomizedSearchCV handle this automatically: they cross-validate every candidate configuration and report the best one along with all scores. The refit parameter (True by default) automatically retrains the best configuration on the full training set.

For nested evaluation (getting an honest estimate of the tuned model's performance), use nested cross-validation: an outer loop evaluates the overall model, and an inner loop tunes hyperparameters within each outer fold. This prevents the test performance from being inflated by the tuning process.

Step 5: Validate the Final Model

After tuning, retrain the model with the best hyperparameters on the entire training set (training + validation). Evaluate once on the held-out test set that was never touched during tuning. This final score is your honest estimate of production performance.

If the test score is substantially worse than the cross-validated score from tuning, the model may be overfitting to the validation folds. Consider simplifying the model, reducing the search space, or collecting more data.

Record the best hyperparameters and the search history. When the model needs retraining on new data, the previously optimal region is a good starting point, though the optimal values may shift as the data changes.

Key Takeaway

Focus tuning on the 2-4 hyperparameters that matter most for your algorithm. Use random search for broad exploration and Bayesian optimization for expensive models. Always evaluate with cross-validation, and validate the final model on a held-out test set. Systematic tuning typically improves model performance by 2-10% over defaults.