Doubly Robust

The doubly robust method (see [Funk2010]) estimates the causal effects when the treatment is discrete and the unconfoundness condition is satisified. Training a doubly robust model is composed of 3 steps.

Let \(k\) be an int. Form a \(K\)-fold random partition for the data \(\{(X_i, W_i, V_i, Y_i)\}_{i = 1}^n\) such that

\[\{(x_i, w_i, v_i, y_i)\}_{i = 1}^n = D_k \cup T_k\]

where \(D_k\) stands for the training data while \(T_k\) stands for the test data and \(\cup_{k = 1}^K T_k = \{(X_i, W_i, V_i, Y_i)\}_{i = 1}^n\).
For each \(k\), train two models \(f(X, W, V)\) and \(g(W, V)\) on \(D_k\) to predict \(y\) and \(x\), respectively. Then evaluate their performances in \(T_k\) whoes results will be saved as \(\{(\hat{X}, \hat{Y})\}_k\). All \(\{(\hat{X}, \hat{Y})\}_k\) will be combined to give the new dataset \(\{(\hat{X}_i, \hat{Y}_i(X, W, V))\}_{i = 1}^n\).
For any given pair of treat group where \(X=x\) and control group where \(X = x_0\), we build the final dataset \(\{(V, \tilde{Y}_x - \tilde{Y}_0)\}\) where \(\tilde{Y}_x\) is defined as

\[\begin{split}\tilde{Y}_x & = \hat{Y}(X=x, W, V) + \frac{(Y - \hat{Y}(X=x, W, V)) * \mathbb{I}(X=x)}{P[X=x| W, V]} \\ \tilde{Y}_0 & = \hat{Y}(X=x_0, W, V) + \frac{(Y - \hat{Y}(X=x_0, W, V)) * \mathbb{I}(X=x_0)}{P[X=x_0| W, V]}\end{split}\]

and train the final machine learning model \(h(W, V)\) on this dataset to predict the causal effect \(\tau(V)\)

\[\tau(V) = \tilde{Y}_x - \tilde{Y}_0 = h(V).\]

Then we can directly estimate the causal effects by passing the covariate \(V\) to the model \(h(V)\).

Example

import numpy as np
from numpy.random import multivariate_normal

from sklearn.ensemble import RandomForestClassifier, GradientBoostingRegressor

import matplotlib.pyplot as plt

from ylearn.estimator_model.meta_learner import SLearner, TLearner, XLearner
from ylearn.estimator_model.doubly_robust import DoublyRobust
from ylearn.exp_dataset.exp_data import binary_data
from ylearn.utils import to_df

# build the dataset
d = 5
n = 2500
n_test = 250

y, x, w = binary_data(n=n, d=d, n_test=n_test)
data = to_df(outcome=y, treatment=x, w=w)
outcome = 'outcome'
treatment = 'treatment'
adjustment = data.columns[2:]

# build the test dataset
treatment_effect = lambda x: (1 if x[1] > 0.1 else 0) * 8

w_test = multivariate_normal(np.zeros(d), np.diag(np.ones(d)), n_test)
delta = 6/n_test
w_test[:, 1] = np.arange(-3, 3, delta)

Train the DoublyRobust Model.

dr = DoublyRobust(
    x_model=RandomForestClassifier(n_estimators=100, max_depth=100, min_samples_leaf=int(n/100)),
    y_model=GradientBoostingRegressor(n_estimators=100, max_depth=100, min_samples_leaf=int(n/100)),
    yx_model=GradientBoostingRegressor(n_estimators=100, max_depth=100, min_samples_leaf=int(n/100)),
    cf_fold=1,
    random_state=2022,
)
dr.fit(data=data, outcome=outcome, treatment=treatment, covariate=adjustment,)
dr_pred = dr.estimate(data=test_data, quantity=None).squeeze()

Class Structures

class ylearn.estimator_model.doubly_robust.DoublyRobust(x_model, y_model, yx_model, cf_fold=1, random_state=2022, categories='auto')

Parameters:

x_model (estimator, optional) – The machine learning model which is trained to modeling the treatment. Any valid x_model should implement the fit() and predict_proba() methods.
y_model (estimator, optional) – The machine learning model which is trained to modeling the outcome with covariates (possibly adjustment) and the treatment. Any valid y_model should implement the fit() and predict() methods.
yx_model (estimator, optional) – The machine learning model which is trained in the final stage of doubly robust method to modeling the causal effects with covariates (possibly adjustment). Any valid yx_model should implement the fit() and predict() methods.
cf_fold (int, default=1) – The number of folds for performing cross fit in the first stage.
random_state (int, default=2022) –
categories (str, optional, default='auto') –

fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)

Fit the DoublyRobust estimator model. Note that the trainig of a doubly robust model has three stages, where we implement them in _fit_1st_stage() and _fit_2nd_stage().

Parameters:

data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group. If None, then treat will be set as 1. In the case of single discrete treatment, treat should be an int or str in one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (int, optional) – Label of the intended control group. This is similar to the cases of treat. If None, then control will be set as 0.

Returns:

The fitted instance of DoublyRobust.

Return type:

instance of DoublyRobust

estimate(data=None, quantity=None, treat=None, all_tr_effects=False)

Estimate the causal effect with the type of the quantity.

Parameters:

data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
1. ’CATE’ : the estimator will evaluate the CATE;
2. ’ATE’ : the estimator will evaluate the ATE;
3. None : the estimator will evaluate the ITE or CITE.
treat (float or numpy.ndarray, optional, default=None) – In the case of single discrete treatment, treat should be an int or str in one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
all_tr_effects (bool, default=False,) – If True, return all causal effects with all values of treatments, otherwise only return the causal effect of the treatment with the value of treat if it is provided. If treat is not provided, then the value of treatment is taken as the value of that when fitting the estimator model.

Returns:

The estimated causal effects

Return type:

ndarray

effect_nji(data=None)

Calculate causal effects with different treatment values. Note that this method only will convert any problem with discrete treatment into that with binary treatment. One can use _effect_nji_all() to get casual effects with all values of treat taken by treatment.

Returns:: Causal effects with different treatment values.
Return type:: ndarray

comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters:

x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –

Returns:

The transformed one-hot vectors.

Return type:

numpy.ndarray