What if I know the propensity score?¶

In some experiment settings we may know beforehand the probabilities of treatment assignments, e.g. if we have data from a RCT with known treatment probabilities.

In that case we may not want to learn a propensity model rather just use the known probabilities.

Loading the data¶

Just like in our example on estimating CATEs with a MetaLearner, we will first load some experiment data:

In [1]:

Copied!





import pandas as pd
from pathlib import Path
from git_root import git_root

df = pd.read_csv(git_root("data/learning_mindset.zip"))
outcome_column = "achievement_score"
treatment_column = "intervention"
feature_columns = [
    column for column in df.columns if column not in [outcome_column, treatment_column]
]
categorical_feature_columns = [
    "ethnicity",
    "gender",
    "frst_in_family",
    "school_urbanicity",
    "schoolid",
]
# Note that explicitly setting the dtype of these features to category
# allows both lightgbm as well as shap plots to
# 1. Operate on features which are not of type int, bool or float
# 2. Correctly interpret categoricals with int values to be
#    interpreted as categoricals, as compared to ordinals/numericals.
for categorical_feature_column in categorical_feature_columns:
    df[categorical_feature_column] = df[categorical_feature_column].astype("category")
import pandas as pd
from pathlib import Path
from git_root import git_root

df = pd.read_csv(git_root("data/learning_mindset.zip"))
outcome_column = "achievement_score"
treatment_column = "intervention"
feature_columns = [
    column for column in df.columns if column not in [outcome_column, treatment_column]
]
categorical_feature_columns = [
    "ethnicity",
    "gender",
    "frst_in_family",
    "school_urbanicity",
    "schoolid",
]
# Note that explicitly setting the dtype of these features to category
# allows both lightgbm as well as shap plots to
# 1. Operate on features which are not of type int, bool or float
# 2. Correctly interpret categoricals with int values to be
#    interpreted as categoricals, as compared to ordinals/numericals.
for categorical_feature_column in categorical_feature_columns:
    df[categorical_feature_column] = df[categorical_feature_column].astype("category")

Using a dummy estimator¶

In this tutorial we will assume that we know that all observations were assigned to the treatment with a fixed probability of 0.3, which is close to the fraction of the observations assigned to the treatment group:

In [2]:

Copied!

df[treatment_column].mean()
df[treatment_column].mean()

Out[2]:

0.3256664421133673

Note

The fact that we have a fixed propensity score for all observations is not true for this dataset, we just use it for illustrational purposes.

Now we can use a custom sklearn-like classifier: FixedBinaryPropensity. The latter can be used like any sklearn classifier but will always return the same propensity, independently of the observed covariates. This propensity has to be provided at initialization via the propensity_score parameter.

Fitting the MetaLearner¶

Finally we can instantiate and fit our MetaLearner using our own custom propensity model:

In [3]:

Copied!





from metalearners import RLearner
from metalearners.utils import FixedBinaryPropensity
from lightgbm import LGBMRegressor

rlearner = RLearner(
    nuisance_model_factory=LGBMRegressor,
    propensity_model_factory=FixedBinaryPropensity,
    treatment_model_factory=LGBMRegressor,
    nuisance_model_params={"verbose": -1},
    propensity_model_params={"propensity_score": 0.3},
    treatment_model_params={"verbose": -1},
    is_classification=False,
    n_variants=2,
)
rlearner.fit(
    X=df[feature_columns],
    y=df[outcome_column],
    w=df[treatment_column],
)
from metalearners import RLearner
from metalearners.utils import FixedBinaryPropensity
from lightgbm import LGBMRegressor

rlearner = RLearner(
    nuisance_model_factory=LGBMRegressor,
    propensity_model_factory=FixedBinaryPropensity,
    treatment_model_factory=LGBMRegressor,
    nuisance_model_params={"verbose": -1},
    propensity_model_params={"propensity_score": 0.3},
    treatment_model_params={"verbose": -1},
    is_classification=False,
    n_variants=2,
)
rlearner.fit(
    X=df[feature_columns],
    y=df[outcome_column],
    w=df[treatment_column],
)

Out[3]:

<metalearners.rlearner.RLearner at 0x302d930d0>

We can check that the propensity estimates correspond to our expectation:

In [4]:

Copied!

rlearner.predict_nuisance(
    X=df[feature_columns], model_kind="propensity_model", model_ord=0, is_oos=False
)
rlearner.predict_nuisance(
    X=df[feature_columns], model_kind="propensity_model", model_ord=0, is_oos=False
)

Out[4]:

array([[0.7, 0.3],
       [0.7, 0.3],
       [0.7, 0.3],
       ...,
       [0.7, 0.3],
       [0.7, 0.3],
       [0.7, 0.3]])

Further comments¶

This example shows how we can use the same propensity score for all observations in the binary treatment setting, the class could be easily extended for multiple treatment variants a. Moreover, customizing the propensity score according to some simple extracted from the input features could easily be accommodated analogously.