Double Machine Learning

Notation

We use capital letters for matrices and small letters for vectors. The treatment is denoted by \(x\), the outcome is denoted by \(y\), the covariate is denoted by \(v\), and other adjustment set variables are \(w\). Greek letters are for error terms.

The double machine learning (DML) model [Chern2016] can be applied when all confounders of the treatment and outcome, variables that simultaneously influence the treatment and outcome, are observed. Let \(y\) be the outcome and \(x\) be the treatment, a DML model solves the following causal effect estimation (CATE estimation):

\[\begin{split}y & = F(v) x + g(v, w) + \epsilon \\ x & = h(v, w) + \eta\end{split}\]

where \(F(v)\) is the CATE conditional on the condition \(v\). Furthermore, to estimate \(F(v)\), we note that

\[y - \mathbb{E}[y|w, v] = F(v) (x - \mathbb{E}[x|w, v]) + \epsilon.\]

Thus by first estimating \(\mathbb{E}[y|w, v]\) and \(\mathbb{E}[x|w,v]\) as

\[\begin{split}m(v, w) & = \mathbb{E}[y|w, v]\\ h(v, w) & = \mathbb{E}[x|w,v],\end{split}\]

we can get a new dataset \((\tilde{y}, \tilde{x})\) where

\[\begin{split}\tilde{y} & = y - m(v, w) \\ \tilde{x} & = x - h(v, w)\end{split}\]

such that the relation between \(\tilde{y}\) and \(\tilde{x}\) is linear

\[\tilde{y} = F(v) \tilde(x) + \epsilon\]

which can be simply modeled by the linear regression model.

On the other hand, in the current version, \(F(v)\) takes the form

\[F_{ij}(v) = \sum_k H_{ijk} \rho_k(v).\]

where \(H\) can be seen as a 3-rank tensor and \(\rho_k\) is a function of the covariate \(v\), e.g., \(\rho(v) = v\) in the simplest case. Therefore, the outcome \(y\) can now be represented as

\[\begin{split}y_i & = \sum_j F_{ij}x_j + g(v, w)_j + \epsilon \\ & = \sum_j \sum_k H_{ijk}\rho_k(v)x_j + g(v, w)_j + \epsilon\end{split}\]

In this sense, the linear regression problem between \(\tilde{y}\) and \(\tilde{x}\) now becomes

\[\tilde{y}_i = \sum_j \sum_k H_{ijk}\rho_k(v) \tilde{x}_j + \epsilon.\]

Implementation

In YLearn, we implement a double machine learning as in the algorithm described in the [Chern2016]:

1. Let k (cf_folds in our class) be an int. Form a k-fold random partition {…, (train_data_i, test_data_i), …, (train_data_k, test_data_k)}.

2. For each i, train y_model and x_model on train_data_i, then evaluate their performances in test_data_i whoes results will be saved as \((\hat{y}_k, \hat{x}_k)\). All \((\hat{y}_k, \hat{x}_k)\) will be combined to give the new dataset \((\hat{y}, \hat{x})\).

  1. Define the differences

\[\begin{split}\tilde{y}& = y - \hat{y}, \\ \tilde{x}&= (x - \hat{x}) \otimes v.\end{split}\]

Then form the new dataset \((\tilde{y}, \tilde{x})\).

4. Perform linear regression on the dataset \((\tilde{y}, \tilde{x})\) whose coefficients will be saved in a vector \(f\). The estimated CATE given \(v\) will just be

\[f \cdot v.\]

Example

from sklearn.ensemble import RandomForestRegressor

from ylearn.exp_dataset.exp_data import single_continuous_treatment
from ylearn.estimator_model.double_ml import DoubleML

# build the dataset
train, val, treatment_effect = single_continuous_treatment()
adjustment = train.columns[:-4]
covariate = 'c_0'
outcome = 'outcome'
treatment = 'treatment'

dml = DoubleML(x_model=RandomForestRegressor(), y_model=RandomForestRegressor(), cf_fold=3,)
dml.fit(train, outcome, treatment, adjustment, covariate,)
>>> 06-23 14:02:36 I ylearn.e.double_ml.py 684 - _fit_1st_stage: fitting x_model RandomForestRegressor
>>> 06-23 14:02:39 I ylearn.e.double_ml.py 690 - _fit_1st_stage: fitting y_model RandomForestRegressor
>>> DoubleML(x_model=RandomForestRegressor(), y_model=RandomForestRegressor(), yx_model=LinearRegression(), cf_fold=3)

Class Structures

class ylearn.estimator_model.double_ml.DoubleML(x_model, y_model, yx_model=None, cf_fold=1, adjustment_transformer=None, covariate_transformer=None, random_state=2022, is_discrete_treatment=False, categories='auto')
Parameters
  • x_model (estimator, optional) – Machine learning models for fitting x. Any such models should implement the fit() and predict`() (also predict_proba() if x is discrete) methods.

  • y_model (estimator, optional) – The machine learning model which is trained to modeling the outcome. Any valid y_model should implement the fit() and predict() methods.

  • yx_model (estimator, optional) – Machine learning models for fitting the residual of y on residual of x. Only support linear regression model in the current version.

  • cf_fold (int, default=1) – The number of folds for performing cross fit in the first stage.

  • adjustment_transformer (transormer, optional, default=None,) – Transformer for adjustment variables which can be used to generate new features of adjustment variables.

  • covariate_transformer (transormer, optional, default=None,) – Transformer for covariate variables which can be used to generate new features of covariate variables.

  • random_state (int, default=2022) –

  • is_discrete_treatment (bool, default=False) – If the treatment variables are discrete, set this to True.

  • categories (str, optional, default='auto') –

fit(data, outcome, treatment, adjustment=None, covariate=None, **kwargs)

Fit the DoubleML estimator model. Note that the training of a DML has two stages, where we implement them in _fit_1st_stage() and _fit_2nd_stage().

Parameters
  • data (pandas.DataFrame) – Training dataset for training the estimator.

  • outcome (list of str, optional) – Names of the outcome.

  • treatment (list of str, optional) – Names of the treatment.

  • adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,

  • covariate (list of str, optional, default=None) – Names of the covariate.

Returns

The fitted model

Return type

an instance of DoubleML

estimate(data=None, treat=None, control=None, quantity=None)

Estimate the causal effect with the type of the quantity.

Parameters
  • data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator directly evaluate all quantities in the training data if data is None.

  • treat (float or numpy.ndarray, optional, default=None) – In the case of single discrete treatment, treat should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, treat should be a float or a ndarray.

  • quantity (str, optional, default=None) –

    Option for returned estimation result. The possible values of quantity include:

    1. ’CATE’ : the estimator will evaluate the CATE;

    2. ’ATE’ : the estimator will evaluate the ATE;

    3. None : the estimator will evaluate the ITE or CITE.

  • control (float or numpy.ndarray, optional, default=None) – This is similar to the cases of treat.

Returns

The estimated causal effects

Return type

ndarray

effect_nji(data=None)

Calculate causal effects with different treatment values.

Parameters

data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator will use the training data if data is None.

Returns

Causal effects with different treatment values.

Return type

ndarray

comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters
  • x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.

  • categories (str or list, optional, default='auto') –

Returns

The transformed one-hot vectors.

Return type

numpy.ndarray