Tuning hyperparameters of a MetaLearner with `MetaLearnerGridSearch`¶

Motivation¶

We know that model selection and/or hyperparameter optimization (HPO) can have massive impacts on the prediction quality in regular Machine Learning. Yet, it seems that model selection and hyperparameter optimization are of substantial importance for CATE estimation with MetaLearners, too, see e.g. Machlanski et. al.

However, model selection and HPO for MetaLearners look quite different from what we're used to from e.g. simple supervised learning problems. Concretely,

In terms of a MetaLearners's option space, there are several levels to optimize for:
1. The MetaLearner architecture, e.g. R-Learner vs DR-Learner
2. The model to choose per base estimator of said MetaLearner architecture, e.g. LogisticRegression vs LGBMClassifier
3. The model hyperparameters per base model
On a conceptual level, it's not clear how to measure model quality for MetaLearners. As a proxy for the underlying quantity of interest one might look into base model performance, the R-Loss of the CATE estimates or some more elaborate approaches alluded to by Machlanski et. al.

We think that HPO can be divided into two camps:

Exploration of (hyperparameter, metric evaluation) pairs where the pairs do not influence each other (e.g. grid search, random search)
Exploration of (hyperparameter, metric evaluation) pairs where the pairs do influence each other (e.g. Bayesian optimization, evolutionary algorithms); in other words, there is a feedback-loop between sample result and sample

In this example, we will illustrate the former and how one can make use of MetaLearnerGridSearch for it. For the latter please refer to the example on model selection with optuna.

Loading the data¶

Just like in our example on estimating CATEs with a MetaLearner, we will first load some experiment data:

In [2]:

Copied!





import pandas as pd
from pathlib import Path
from git_root import git_root

df = pd.read_csv(git_root("data/learning_mindset.zip"))
outcome_column = "achievement_score"
treatment_column = "intervention"
feature_columns = [
    column for column in df.columns if column not in [outcome_column, treatment_column]
]
categorical_feature_columns = [
    "ethnicity",
    "gender",
    "frst_in_family",
    "school_urbanicity",
    "schoolid",
]
# Note that explicitly setting the dtype of these features to category
# allows both lightgbm as well as shap plots to
# 1. Operate on features which are not of type int, bool or float
# 2. Correctly interpret categoricals with int values to be
#    interpreted as categoricals, as compared to ordinals/numericals.
for categorical_feature_column in categorical_feature_columns:
    df[categorical_feature_column] = df[categorical_feature_column].astype("category")
import pandas as pd
from pathlib import Path
from git_root import git_root

df = pd.read_csv(git_root("data/learning_mindset.zip"))
outcome_column = "achievement_score"
treatment_column = "intervention"
feature_columns = [
    column for column in df.columns if column not in [outcome_column, treatment_column]
]
categorical_feature_columns = [
    "ethnicity",
    "gender",
    "frst_in_family",
    "school_urbanicity",
    "schoolid",
]
# Note that explicitly setting the dtype of these features to category
# allows both lightgbm as well as shap plots to
# 1. Operate on features which are not of type int, bool or float
# 2. Correctly interpret categoricals with int values to be
#    interpreted as categoricals, as compared to ordinals/numericals.
for categorical_feature_column in categorical_feature_columns:
    df[categorical_feature_column] = df[categorical_feature_column].astype("category")

Now that we've loaded the experiment data, we can split it up into train and validation data:

In [3]:

Copied!

from sklearn.model_selection import train_test_split

X_train, X_validation, y_train, y_validation, w_train, w_validation = train_test_split(
    df[feature_columns], df[outcome_column], df[treatment_column], test_size=0.25
)
from sklearn.model_selection import train_test_split

X_train, X_validation, y_train, y_validation, w_train, w_validation = train_test_split(
    df[feature_columns], df[outcome_column], df[treatment_column], test_size=0.25
)

Performing the grid search¶

We can run a grid search by using the MetaLearnerGridSearch class. However, it's important to note that this class only supports a single MetaLearner architecture at a time. If you're interested in conducting a grid search across multiple architectures, it will require several grid searches.

Let's say we want to work with a DRLearner. We can check the names of the base models for this architecture with the following code:

In [4]:

Copied!

from metalearners import DRLearner

print(DRLearner.nuisance_model_specifications().keys())
print(DRLearner.treatment_model_specifications().keys())
from metalearners import DRLearner

print(DRLearner.nuisance_model_specifications().keys())
print(DRLearner.treatment_model_specifications().keys())

dict_keys(['propensity_model', 'variant_outcome_model'])
dict_keys(['treatment_model'])

We see that this MetaLearner contains three base models: "variant_outcome_model", "propensity_model" and "treatment_model".

Since our problem has a regression outcome, the "variant_outcome_model" should be a regressor. The "propensity_model" and "treatment_model" are always a classifier and a regressor respectively.

To instantiate the MetaLearnerGridSearch object we need to specify the different base models to be used. Moreover, if we'd like to use non-default hyperparameters for a given base model, we need to specify those, too.

In this tutorial we test a LinearRegression and LGBMRegressor for the outcome model, a LGBMClassifier and QuadraticDiscriminantAnalysis for the propensity model and a LGBMRegressor for the treatment model.

Finally we can define the hyperparameters to test for the base models using the param_grid parameter.

In [5]:

Copied!





from metalearners.grid_search import MetaLearnerGridSearch
from lightgbm import LGBMClassifier, LGBMRegressor
from sklearn.linear_model import LinearRegression
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

gs = MetaLearnerGridSearch(
    metalearner_factory=DRLearner,
    metalearner_params={"is_classification": False, "n_variants": 2},
    base_learner_grid={
        "variant_outcome_model": [LinearRegression, LGBMRegressor],
        "propensity_model": [LGBMClassifier, QuadraticDiscriminantAnalysis],
        "treatment_model": [LGBMRegressor],
    },
    param_grid={
        "variant_outcome_model": {
            "LGBMRegressor": {"n_estimators": [3, 5], "verbose": [-1]}
        },
        "treatment_model": {"LGBMRegressor": {"n_estimators": [1, 2], "verbose": [-1]}},
        "propensity_model": {
            "LGBMClassifier": {"n_estimators": [1, 2, 3], "verbose": [-1]}
        },
    },
)
from metalearners.grid_search import MetaLearnerGridSearch
from lightgbm import LGBMClassifier, LGBMRegressor
from sklearn.linear_model import LinearRegression
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

gs = MetaLearnerGridSearch(
    metalearner_factory=DRLearner,
    metalearner_params={"is_classification": False, "n_variants": 2},
    base_learner_grid={
        "variant_outcome_model": [LinearRegression, LGBMRegressor],
        "propensity_model": [LGBMClassifier, QuadraticDiscriminantAnalysis],
        "treatment_model": [LGBMRegressor],
    },
    param_grid={
        "variant_outcome_model": {
            "LGBMRegressor": {"n_estimators": [3, 5], "verbose": [-1]}
        },
        "treatment_model": {"LGBMRegressor": {"n_estimators": [1, 2], "verbose": [-1]}},
        "propensity_model": {
            "LGBMClassifier": {"n_estimators": [1, 2, 3], "verbose": [-1]}
        },
    },
)

Now we can call fit with the train and validation data and can inspect the results DataFrame in results_.

In [6]:

Copied!

gs.fit(X_train, y_train, w_train, X_validation, y_validation, w_validation)
gs.results_
gs.fit(X_train, y_train, w_train, X_validation, y_validation, w_validation)
gs.results_

Out[6]:

										fit_time	score_time	train_variant_outcome_model_0_neg_root_mean_squared_error	train_variant_outcome_model_1_neg_root_mean_squared_error	train_propensity_model_neg_log_loss	train_treatment_model_1_vs_0_neg_root_mean_squared_error	test_variant_outcome_model_0_neg_root_mean_squared_error	test_variant_outcome_model_1_neg_root_mean_squared_error	test_propensity_model_neg_log_loss	test_treatment_model_1_vs_0_neg_root_mean_squared_error
metalearner	propensity_model	propensity_model_n_estimators	propensity_model_verbose	variant_outcome_model	variant_outcome_model_n_estimators	variant_outcome_model_verbose	treatment_model	treatment_model_n_estimators	treatment_model_verbose
DRLearner	LGBMClassifier	1.0	-1.0	LinearRegression	NaN	NaN	LGBMRegressor	1	-1	0.362632	0.068002	-0.852074	-0.848146	-0.632090	-1.813472	-0.840509	-0.832314	-0.628208	-1.771340
		1.0	-1.0	LinearRegression	NaN	NaN	LGBMRegressor	2	-1	0.388284	0.067350	-0.852355	-0.848704	-0.631710	-1.811203	-0.840509	-0.832314	-0.628208	-1.769949
		2.0	-1.0	LinearRegression	NaN	NaN	LGBMRegressor	1	-1	0.392640	0.066799	-0.852027	-0.847422	-0.631746	-1.812832	-0.840509	-0.832314	-0.628263	-1.773597
		2.0	-1.0	LinearRegression	NaN	NaN	LGBMRegressor	2	-1	0.454654	0.067791	-0.852033	-0.847687	-0.632071	-1.816253	-0.840509	-0.832314	-0.628263	-1.773111
		3.0	-1.0	LinearRegression	NaN	NaN	LGBMRegressor	1	-1	0.451604	0.069173	-0.851851	-0.847961	-0.632294	-1.815351	-0.840509	-0.832314	-0.628397	-1.777481
		3.0	-1.0	LinearRegression	NaN	NaN	LGBMRegressor	2	-1	0.512599	0.068227	-0.852593	-0.848230	-0.632181	-1.817798	-0.840509	-0.832314	-0.628397	-1.777205
		1.0	-1.0	LGBMRegressor	3.0	-1.0	LGBMRegressor	1	-1	0.752140	0.090002	-0.897820	-0.914654	-0.631841	-1.937299	-0.904254	-0.883362	-0.628208	-1.893821
					5.0	-1.0	LGBMRegressor	1	-1	1.030002	0.092309	-0.868651	-0.881616	-0.632241	-1.867413	-0.874428	-0.851044	-0.628208	-1.824367
					3.0	-1.0	LGBMRegressor	2	-1	0.813204	0.088926	-0.897875	-0.916016	-0.632136	-1.938607	-0.904254	-0.883362	-0.628208	-1.893318
					5.0	-1.0	LGBMRegressor	2	-1	1.075202	0.097479	-0.868589	-0.883810	-0.632147	-1.869441	-0.874428	-0.851044	-0.628208	-1.823845
		2.0	-1.0	LGBMRegressor	3.0	-1.0	LGBMRegressor	1	-1	0.863883	0.093149	-0.898204	-0.916213	-0.631750	-1.939523	-0.904254	-0.883362	-0.628263	-1.897660
					5.0	-1.0	LGBMRegressor	1	-1	1.065925	0.091348	-0.869530	-0.881849	-0.631758	-1.869640	-0.874428	-0.851044	-0.628263	-1.828047
					3.0	-1.0	LGBMRegressor	2	-1	0.874774	0.089929	-0.896739	-0.915254	-0.632360	-1.940699	-0.904254	-0.883362	-0.628263	-1.896607
					5.0	-1.0	LGBMRegressor	2	-1	1.115629	0.090655	-0.869139	-0.880639	-0.631567	-1.865421	-0.874428	-0.851044	-0.628263	-1.827504
		3.0	-1.0	LGBMRegressor	3.0	-1.0	LGBMRegressor	1	-1	0.876834	0.088035	-0.897744	-0.914163	-0.631423	-1.937204	-0.904254	-0.883362	-0.628397	-1.901964
					5.0	-1.0	LGBMRegressor	1	-1	1.144860	0.089213	-0.868147	-0.882217	-0.631853	-1.871375	-0.874428	-0.851044	-0.628397	-1.831622
					3.0	-1.0	LGBMRegressor	2	-1	0.936002	0.092514	-0.896970	-0.916339	-0.632114	-1.946211	-0.904254	-0.883362	-0.628397	-1.901835
					5.0	-1.0	LGBMRegressor	2	-1	1.196836	0.098911	-0.868453	-0.880995	-0.632492	-1.872229	-0.874428	-0.851044	-0.628397	-1.831039
	QuadraticDiscriminantAnalysis	NaN	NaN	LinearRegression	NaN	NaN	LGBMRegressor	1	-1	0.252870	0.058438	-0.852213	-0.849771	-0.640512	-2.276244	-0.840509	-0.832314	-0.638479	-4.386575
				LinearRegression	NaN	NaN	LGBMRegressor	2	-1	0.325631	0.059346	-0.851522	-0.849607	-0.640054	-2.272417	-0.840509	-0.832314	-0.638479	-4.374702
				LGBMRegressor	3.0	-1.0	LGBMRegressor	1	-1	0.678114	0.084764	-0.897237	-0.916300	-0.640197	-2.219056	-0.904254	-0.883362	-0.638479	-2.311704
					5.0	-1.0	LGBMRegressor	1	-1	0.912964	0.083096	-0.869783	-0.883112	-0.641877	-2.757328	-0.874428	-0.851044	-0.638479	-2.196190
					3.0	-1.0	LGBMRegressor	2	-1	0.747791	0.082716	-0.897858	-0.914723	-0.640128	-2.231605	-0.904254	-0.883362	-0.638479	-2.313465
					5.0	-1.0	LGBMRegressor	2	-1	0.977537	0.084649	-0.868732	-0.879986	-0.639841	-2.204121	-0.874428	-0.851044	-0.638479	-2.202680

Reusing base models¶

In order to decrease the grid search runtime, it may sometimes be desirable to reuse some nuisance models. We refer to our example of model reusage for a more in depth explanation on how this can be achieved, but here we'll show an example for the integration of model reusage with MetaLearnerGridSearch.

We will reuse the "variant_outcome_model" of a TLearner for a grid search over the XLearner.

In [7]:

Copied!





from metalearners import TLearner, XLearner

tl = TLearner(
    False,
    2,
    LGBMRegressor,
    nuisance_model_params={"verbose": -1, "n_estimators": 20, "learning_rate": 0.05},
    n_folds=2,
)
tl.fit(X_train, y_train, w_train)

gs = MetaLearnerGridSearch(
    metalearner_factory=XLearner,
    metalearner_params={
        "is_classification": False,
        "n_variants": 2,
        "n_folds": 5, # The number of folds does not need to be the same as in the TLearner
        "fitted_nuisance_models": {
            "variant_outcome_model": tl._nuisance_models["variant_outcome_model"]
        },
    },
    base_learner_grid={
        "propensity_model": [LGBMClassifier],
        "control_effect_model": [LGBMRegressor, LinearRegression],
        "treatment_effect_model": [LGBMRegressor, LinearRegression],
    },
    param_grid={
        "propensity_model": {"LGBMClassifier": {"n_estimators": [5], "verbose": [-1]}},
        "treatment_effect_model": {
            "LGBMRegressor": {"n_estimators": [5, 10], "verbose": [-1]}
        },
        "control_effect_model": {
            "LGBMRegressor": {"n_estimators": [1, 3], "verbose": [-1]}
        },
    },
)

gs.fit(X_train, y_train, w_train, X_validation, y_validation, w_validation)
gs.results_
from metalearners import TLearner, XLearner

tl = TLearner(
    False,
    2,
    LGBMRegressor,
    nuisance_model_params={"verbose": -1, "n_estimators": 20, "learning_rate": 0.05},
    n_folds=2,
)
tl.fit(X_train, y_train, w_train)

gs = MetaLearnerGridSearch(
    metalearner_factory=XLearner,
    metalearner_params={
        "is_classification": False,
        "n_variants": 2,
        "n_folds": 5, # The number of folds does not need to be the same as in the TLearner
        "fitted_nuisance_models": {
            "variant_outcome_model": tl._nuisance_models["variant_outcome_model"]
        },
    },
    base_learner_grid={
        "propensity_model": [LGBMClassifier],
        "control_effect_model": [LGBMRegressor, LinearRegression],
        "treatment_effect_model": [LGBMRegressor, LinearRegression],
    },
    param_grid={
        "propensity_model": {"LGBMClassifier": {"n_estimators": [5], "verbose": [-1]}},
        "treatment_effect_model": {
            "LGBMRegressor": {"n_estimators": [5, 10], "verbose": [-1]}
        },
        "control_effect_model": {
            "LGBMRegressor": {"n_estimators": [1, 3], "verbose": [-1]}
        },
    },
)

gs.fit(X_train, y_train, w_train, X_validation, y_validation, w_validation)
gs.results_

Out[7]:

										fit_time	score_time	train_variant_outcome_model_0_neg_root_mean_squared_error	train_variant_outcome_model_1_neg_root_mean_squared_error	train_propensity_model_neg_log_loss	train_treatment_effect_model_1_vs_0_neg_root_mean_squared_error	train_control_effect_model_1_vs_0_neg_root_mean_squared_error	test_variant_outcome_model_0_neg_root_mean_squared_error	test_variant_outcome_model_1_neg_root_mean_squared_error	test_propensity_model_neg_log_loss	test_treatment_effect_model_1_vs_0_neg_root_mean_squared_error	test_control_effect_model_1_vs_0_neg_root_mean_squared_error
metalearner	propensity_model	propensity_model_n_estimators	propensity_model_verbose	control_effect_model	control_effect_model_n_estimators	control_effect_model_verbose	treatment_effect_model	treatment_effect_model_n_estimators	treatment_effect_model_verbose
XLearner	LGBMClassifier	5	-1	LGBMRegressor	1.0	-1.0	LGBMRegressor	5.0	-1.0	0.468590	0.046710	-0.835861	-0.84813	-0.631601	-0.814415	-0.824946	-0.8372	-0.811075	-0.628424	-0.789648	-0.833076
					1.0	-1.0	LGBMRegressor	10.0	-1.0	0.626367	0.046410	-0.835861	-0.84813	-0.631980	-0.803432	-0.825162	-0.8372	-0.811075	-0.628424	-0.779851	-0.833076
					3.0	-1.0	LGBMRegressor	5.0	-1.0	0.532225	0.047818	-0.835861	-0.84813	-0.633323	-0.813032	-0.816535	-0.8372	-0.811075	-0.628424	-0.789648	-0.825040
					3.0	-1.0	LGBMRegressor	10.0	-1.0	0.691459	0.046533	-0.835861	-0.84813	-0.633088	-0.803686	-0.816648	-0.8372	-0.811075	-0.628424	-0.779851	-0.825040
					1.0	-1.0	LinearRegression	NaN	NaN	0.293462	0.043040	-0.835861	-0.84813	-0.633597	-0.808634	-0.824883	-0.8372	-0.811075	-0.628424	-0.788946	-0.833076
					3.0	-1.0	LinearRegression	NaN	NaN	0.366940	0.045290	-0.835861	-0.84813	-0.632851	-0.810018	-0.815930	-0.8372	-0.811075	-0.628424	-0.788946	-0.825040
				LinearRegression	NaN	NaN	LGBMRegressor	5.0	-1.0	0.427425	0.046298	-0.835861	-0.84813	-0.633093	-0.813929	-0.817702	-0.8372	-0.811075	-0.628424	-0.789648	-0.816616
							LGBMRegressor	10.0	-1.0	0.574789	0.045061	-0.835861	-0.84813	-0.633856	-0.805584	-0.817893	-0.8372	-0.811075	-0.628424	-0.779851	-0.816616
							LinearRegression	NaN	NaN	0.252676	0.040404	-0.835861	-0.84813	-0.633268	-0.809481	-0.818456	-0.8372	-0.811075	-0.628424	-0.788946	-0.816616

What if I run out of memory?¶

If you're conducting an optimization task over a large grid with a substantial dataset, it is possible that memory usage issues may arise. To try to solve these, you can minimize memory usage by adjusting your settings.

In that case you can set store_raw_results=False, the grid search will then operate with a generator rather than a list, significantly reducing memory usage.

If the results_ DataFrame is what you're after, you can simply set store_results=True. However, if you aim to iterate over the MetaLearner objects, you can set store_results=False. Consequently, raw_results_ will become a generator object yielding GSResult.

Further comments¶

We strongly recommend only reusing base models if they have been trained on exactly the same data. If this is not the case, some functionalities will probably not work as hoped for.

Tuning hyperparameters of a MetaLearner with MetaLearnerGridSearch¶