Meta-Learner

Meta-Learners [Kunzel2019] are estimator models that aim to estimate the CATE by taking advantage of machine learning models when the treatment is discrete, e.g., the treatment has only two values 1 and 0, and when the unconfoundedness condition is satisified. Generally speaking, it employs multiple machine learning models with the flexibility on the choice of models.

YLearn implements 3 Meta-Learners: S-Learner, T-Learner, and X-Learner. We provide below several useful examples before introducing their class structures.

Example

import numpy as np
from numpy.random import multivariate_normal

from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor

import matplotlib.pyplot as plt

from ylearn.estimator_model import SLearner, TLearner, XLearner
from ylearn.exp_dataset.exp_data import binary_data
from ylearn.utils import to_df

# build the dataset
d = 5
n = 2500
n_test = 250

y, x, w = binary_data(n=n, d=d, n_test=n_test)
data = to_df(outcome=y, treatment=x, w=w)
outcome = 'outcome'
treatment = 'treatment'
adjustment = data.columns[2:]

# build the test dataset
treatment_effect = lambda x: (1 if x[1] > 0.1 else 0) * 8

w_test = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n_test)
delta = 6/n_test
w_test[:, 1] = np.arange(-3, 3, delta)

SLearner

s = SLearner(model=GradientBoostingRegressor())
s.fit(data=data, outcome=outcome, treatment=treatment, adjustment=adjustment) # training
s_pred = s.estimate(data=test_data, quantity=None) # predicting

TLearner

t = TLearner(model=GradientBoostingRegressor())
t.fit(data=data, outcome=outcome, treatment=treatment, adjustment=adjustment) # training
t_pred = t.estimate(data=test_data, quantity=None) # predicting

XLearner

x = XLearner(model=GradientBoostingRegressor())
x.fit(data=data, outcome=outcome, treatment=treatment, adjustment=adjustment) # training
x_pred = x.estimate(data=test_data, quantity=None) # predicting

S-Learner

SLearner uses one machine learning model to estimate the causal effects. Specifically, we fit a model to predict outcome \(y\) from treatment \(x\) and adjustment set (or covariate) \(w\) with a machine learning model \(f\):

\[y = f(x, w).\]

The causal effect \(\tau(w)\) is then calculated as

\[\tau(w) = f(x=1, w) - f(x=0, w).\]

class ylearn.estimator_model.meta_learner.SLearner(model, random_state=2022, is_discrete_treatment=True, categories='auto', *args, **kwargs)

Parameters

model (estimator, optional) – The base machine learning model for training SLearner. Any model should be some valid machine learning model with fit() and predict() functions.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=True) – Treatment must be discrete for SLearner.
categories (str, optional, default='auto') –

fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)

Fit the SLearner in the dataset.

Parameters

data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group
control (int, optional) – Label of the intended control group
combined_treatment (bool, optional, default=True) –
Only modify this parameter for multiple treatments, where multiple discrete treatments are combined to give a single new group of discrete treatment if set as True. When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
1. treatment_1: \(x_1 | x_1 \in \{'sleep', 'run'\}\),
2. treatment_2: \(x_2 | x_2 \in \{'study', 'work'\}\),
then we can convert these two binary classification tasks into a single classification task with 4 different classes:

treatment: \(x | x \in \{0, 1, 2, 3\}\),

where, for example, 1 stands for (‘sleep’ and ‘stuy’).

Returns

The fitted instance of SLearner.

Return type

instance of SLearner

estimate(data=None, quantity=None)

Estimate the causal effect with the type of the quantity.

Parameters

data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
1. ’CATE’ : the estimator will evaluate the CATE;
2. ’ATE’ : the estimator will evaluate the ATE;
3. None : the estimator will evaluate the ITE or CITE.

Returns

The estimated causal effects

Return type

ndarray

effect_nji(data=None)

Calculate causal effects with different treatment values.

Returns: Causal effects with different treatment values.
Return type: ndarray

_comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters

x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –

Returns

The transformed one-hot vectors.

Return type

numpy.ndarray

T-Learner

The problem of SLearner is that the treatment vector is only 1-dimensional while the adjustment vector could be multi-dimensional. Thus if the dimension of the adjustment is much larger than 1, then the estimated results will always be close to 0. TLearner uses two machine learning models to estimate the causal effect. Specifically, let \(w\) denote the adjustment set (or covariate), we

Fit two models \(f_t(w)\) for the treatment group (\(x=\) treat) and \(f_0(w)\) for the control group (\(x=\) control), respectively:

\[y_t = f_t(w)\]

with data where \(x=\) treat and

\[y_0 = f_0(w)\]

with data where \(x=\) control.

Compute the causal effect \(\tau(w)\) as the difference between predicted results of these two models:

\[\tau(w) = f_t(w) - f_0(w).\]

class ylearn.estimator_model.meta_learner.TLearner(model, random_state=2022, is_discrete_treatment=True, categories='auto', *args, **kwargs)

Parameters

model (estimator, optional) – The base machine learning model for training SLearner. Any model should be some valid machine learning model with fit() and predict() functions.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=True) – Treatment must be discrete for SLearner.
categories (str, optional, default='auto') –

fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)

Fit the TLearner in the dataset.

Parameters

data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group
control (int, optional) – Label of the intended control group
combined_treatment (bool, optional, default=True) –
Only modify this parameter for multiple treatments, where multiple discrete treatments are combined to give a single new group of discrete treatment if set as True. When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert the multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
1. treatment_1: \(x_1 | x_1 \in \{'sleep', 'run'\}\),
2. treatment_2: \(x_2 | x_2 \in \{'study', 'work'\}\),
then we can convert these two binary classification tasks into a single classification task with 4 different classes:

treatment: \(x | x \in \{0, 1, 2, 3\}\),

where, for example, 1 stands for (‘sleep’ and ‘stuy’).

Returns

The fitted instance of TLearner.

Return type

instance of TLearner

estimate(data=None, quantity=None)

Estimate the causal effect with the type of the quantity.

Parameters

data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
1. ’CATE’ : the estimator will evaluate the CATE;
2. ’ATE’ : the estimator will evaluate the ATE;
3. None : the estimator will evaluate the ITE or CITE.

Returns

The estimated causal effects

Return type

ndarray

effect_nji(data=None)

Calculate causal effects with different treatment values.

Returns: Causal effects with different treatment values.
Return type: ndarray

_comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters

x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –

Returns

The transformed one-hot vectors.

Return type

numpy.ndarray

X-Learner

TLearner does not use all data efficiently. This issue can can be addressed by the XLearner which utilities all data to train several models. Training a XLearner is composed of 3 steps:

As in the case of TLearner, we first train two different models for the control group and treated group, respectively:

\[\begin{split}& f_0(w) \text{for the control group}\\ & f_t(w) \text{for the treat group}.\end{split}\]
Generate two new datasets \(\{(h_0, w)\}\) using the control group and \(\{(h_t, w)\}\) using the treated group where

\[\begin{split}h_0 & = f_t(w) - y_0,\\ h_t & = y_t - f_0(w).\end{split}\]

Then train two new machine learing models \(k_0(w)\) and \(k_t(w)\) in these datasets such that

\[\begin{split}h_0 & = k_0(w) \\ h_t & = k_t(w).\end{split}\]
Get the final model by combining the above two models:

\[g(w) = k_0(w)a(w) + k_t(w)(1 - a(w))\]

where \(a(w)\) is a coefficient adjusting the weight of \(k_0\) and \(k_t\).

Finally, the casual effect \(\tau(w)\) can be estimated as follows:

\[\tau(w) = g(w).\]

class ylearn.estimator_model.meta_learner.XLearner(model, random_state=2022, is_discrete_treatment=True, categories='auto', *args, **kwargs)

Parameters

model (estimator, optional) – The base machine learning model for training XLearner. Any model should be some valid machine learning model with fit() and predict() functions.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=True) – Treatment must be discrete for SLearner.
categories (str, optional, default='auto') –

fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)

Fit the XLearner in the dataset.

Parameters

data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group
control (int, optional) – Label of the intended control group
combined_treatment (bool, optional, default=True) –
Only modify this parameter for multiple treatments, where multiple discrete treatments are combined to give a single new group of discrete treatment if set as True. When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert the multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
1. treatment_1: \(x_1 | x_1 \in \{'sleep', 'run'\}\),
2. treatment_2: \(x_2 | x_2 \in \{'study', 'work'\}\),
then we can convert these two binary classification tasks into a single classification task with 4 different classes:

treatment: \(x | x \in \{0, 1, 2, 3\}\),

where, for example, 1 stands for (‘sleep’ and ‘stuy’).

Returns

The fitted instance of XLearner.

Return type

instance of XLearner

estimate(data=None, quantity=None)

Estimate the causal effect with the type of the quantity.

Parameters

data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
1. ’CATE’ : the estimator will evaluate the CATE;
2. ’ATE’ : the estimator will evaluate the ATE;
3. None : the estimator will evaluate the ITE or CITE.

Returns

The estimated causal effects

Return type

ndarray

effect_nji(data=None)

Calculate causal effects with different treatment values.

Returns: Causal effects with different treatment values.
Return type: ndarray

_comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters

x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –

Returns

The transformed one-hot vectors.

Return type

numpy.ndarray