Narrowing the Search: Which Hyperparameters Really Matter?

Tech Blog Aimee Coelho
a map with a cup of coffee on it and euro banknotes and coins placed on a table
Hyperparameter optimization feels like figuring out a map on a budget (Credit: Pexels)

How Can You Identify Which Hyperparameters Are Important?

the marginal contributions over all datasets for SVM (RBF Kernel) (left) showing that hyperparameter gamma was the most important and AdaBoost (right) showing that max depth was the most important hyperparameter
Figure 1: From the original paper, the marginal contributions over all datasets for SVM (RBF Kernel) (left) showing that hyperparameter gamma was the most important and AdaBoost (right) showing that max depth was the most important hyperparameter. 

Hyperparameter Importance Density Distributions

Examples of density distributions of the top ten evaluations from all datasets for some algorithm hyperparameters
Figure 2: Examples of density distributions of the top ten evaluations from all datasets for some of the hyperparameters of the algorithms studied in the paper, Random Forest, Adaboost, SVM (RBF kernel) and SVM (sigmoid).

More Random Forest Hyperparameters

The violin plot of importances for this set of hyperparameters for random forest, showing min samples leaf to be by far the most important hyperparamete
Figure 3: The violin plot of importances for this set of hyperparameters for random forest, showing min samples leaf to be by far the most important hyperparameter generally.
marginal performance for min samples leaf for random forest on the mice protein dataset and on the analcatdata DMFT dataset
Figure 4: The marginal performance for min samples leaf for random forest on the mice protein dataset and on the analcatdata DMFT dataset where higher values of min samples leaf are better.
One-dimensional density plots of the 10 best values of min samples leaf across all datasets
Figure 5: One-dimensional density plots of the 10 best values of min samples leaf across all datasets, from the original paper (left) and for our new experiments on OpenML-CC18 data with different hyperparameters (right).

XGBoost Experiments

A violin plot showing the most important hyperparameters and interactions for XGBoost over all datasets
Figure 6: A violin plot showing the most important hyperparameters and interactions for XGBoost over all datasets with learning rate generally the most important.

Important Hyperparameters for XGBoost

density distributions of the ten best values over all datasets for the hyperparameters learning rate
Figure 7: The density distributions of the ten best values over all datasets for the hyperparameters learning rate, subsample and min child weight for XGBoost.

What Are Our Takeaways From This Experiment?

Why Is Min Samples Leaf for Random Forest Almost Always Better Set So Low?

How to Set the Number of Estimators?

An example marginal for the number of estimators parameter for random forest on the madelon dataset, and the density distribution of the best values of n estimators over all dataset
FigurFigure 8: An example marginal for the number of estimators parameter for random forest on the madelon dataset, and the density distribution of the best values of n estimators over all datasets.

The Importance of a Good Benchmarking Set

You May Also Like

A Primer on Data Drift

Read More

The Many Flavors of Gradient Boosting Algorithms

Read More