Approxmation Bound for Causal Effects

Many estimator models require the unconfoundedness condition which is usually untestable. One applicable approach is to build the upper and lower bounds of our causal effects before diving into specifical estimations.

There are four different bounds in YLearn. We briefly introduce them as follows. One can see [Neal2020] for details.

No-Assumptions Bound

Suppose that

\[\forall x, a \leq Y(do(x)) \leq b,\]

then we have

\[\begin{split}\mathbb{E}[Y(do(1)) - Y(do(0))] & \leq \pi \mathbb{E}[Y|X = 1] + (1 - \pi) b - \pi a - (1 - \pi )\mathbb{E}[Y| X = 0]\\ \mathbb{E}[Y(do(1)) - Y(do(0))] & \geq \pi \mathbb{E}[Y|X = 1] + (1 - \pi) a - \pi b - (1 - \pi )\mathbb{E}[Y| X = 0]\end{split}\]

where \(\pi\) is the probabiity of taking \(X=1\).

Nonnegative Monotone Treatment Response Bound

Suppose that

\[\forall i, Y(do(1)) \geq Y(do(0)),\]

which means that the treatment can only help. Then we have the following bound:

\[\begin{split}\mathbb{E}[Y(do(1)) - Y(do(0))] & \leq \pi \mathbb{E}[Y|X = 1] + (1 - \pi) b - \pi a - (1 - \pi )\mathbb{E}[Y| X = 0]\\ \mathbb{E}[Y(do(1)) - Y(do(0))] & \geq 0\end{split}\]

Nonpositive Monotone Treatment Response Bound

Suppose that

\[\forall i, Y(do(1)) \leq Y(do(0)),\]

which means that the treatment can never help. Then we have the following bound:

\[\begin{split}\mathbb{E}[Y(do(1)) - Y(do(0))] & \leq 0\\ \mathbb{E}[Y(do(1)) - Y(do(0))] & \geq \pi \mathbb{E}[Y|X = 1] + (1 - \pi) a - \pi b - (1 - \pi )\mathbb{E}[Y| X = 0].\end{split}\]

Optimal Treatment Selection Bound

Suppose that

\[\begin{split}X = 1 &\implies Y(do(1)) \geq Y(do(0)) \\ X = 0 & \implies Y(do(0)) \geq Y(do(1))\end{split}\]

which means that people always receive the treatment if it is the best for them. Then we have the following bound:

\[\begin{split}\mathbb{E}[Y(do(1)) - Y(do(0))] & \leq \pi \mathbb{E}[Y|X = 1] - \pi a\\ \mathbb{E}[Y(do(1)) - Y(do(0))] & \geq (1 - \pi) a - (1 - \pi )\mathbb{E}[Y| X = 0].\end{split}\]

There are one more optimal treatment selection bound:

\[\begin{split}\mathbb{E}[Y(do(1)) - Y(do(0))] & \leq \mathbb{E}[Y|X = 1] - \pi a - (1 - \pi)\mathbb{E}[Y|X=0]\\ \mathbb{E}[Y(do(1)) - Y(do(0))] & \geq \pi\mathbb{E}[Y|X = 1] + (1 - \pi) a - \mathbb{E}[Y| X = 0].\end{split}\]

Example

import numpy as np

from ylearn.estimator_model.approximation_bound import ApproxBound
from ylearn.exp_dataset.exp_data import meaningless_discrete_dataset_
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

data = meaningless_discrete_dataset_(num=num, confounder_n=3, treatment_effct=[2, 5, -8], random_seed=0)
treatment = 'treatment'
w = ['w_0', 'w_1', 'w_2']
outcome = 'outcome'

bound = ApproxBound(y_model=RandomForestRegressor(), x_model=RandomForestClassifier())
bound.fit(data=data, treatment=treatment, outcome=outcome, covariate=w,)

>>> ApproxBound(y_model=RandomForestRegressor(), x_prob=array([[0.  , 0.99, 0.01],
            [0.  , 0.99, 0.01],
            [1.  , 0.  , 0.  ],
            ...,
            [0.  , 1.  , 0.  ],
            [0.01, 0.99, 0.  ],
            [0.01, 0.99, 0.  ]]), x_model=RandomForestClassifier())

b_l, b_u = bound1.estimate()
b_l.mean()

>>> -7.126728994957785

b_u.mean()

>>> 8.994011617037696

Class Structures

class ylearn.estimator_model.approximation_bound.ApproxBound(y_model, x_prob=None, x_model=None, random_state=2022, is_discrete_treatment=True, categories='auto')

A model used for estimating the upper and lower bounds of the causal effects.

Parameters

y_model (estimator, optional) – Any valid y_model should implement the fit() and predict() methods
x_prob (ndarray of shape (c, ), optional, default=None) – An array of probabilities assigning to the corresponding values of x where c is the number of different treatment classes. All elements in the array are positive and sumed to 1. For example, x_prob = array([0.5, 0.5]) means both x = 0 and x = 1 take probability 0.5. Please set this as None if you are using multiple treatments.
x_model (estimator, optional, default=None) – Models for predicting the probabilities of treatment. Any valid x_model should implement the fit() and predict_proba() methods.
random_state (int, optional, default=2022) –
is_discrete_treatment (bool, optional, default=True) – True if the treatment is discrete.
categories (str, optional, default='auto') –

fit(data, outcome, treatment, covariate=None, is_discrete_covariate=False, **kwargs)

Fit x_model and y_model.

Parameters

data (pandas.DataFrame) – Training data.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
covariate (list of str, optional, default=None) – Names of the covariate.
is_discrete_covariate (bool, optional, default=False) –

Returns

The fitted instance of ApproxBound.

Return type

instance of ApproxBound

Raises

ValueError – Raise error when the treatment is not discrete.

estimate(data=None, treat=None, control=None, y_upper=None, y_lower=None, assump=None)

Estimate the approximation bound of the causal effect of the treatment on the outcome.

Parameters

data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
treat (ndarray of str, optional, default=None) – Values of the treatment group. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (ndarray of str, optional, default=None) – Values of the control group.
y_upper (float, defaults=None) – The upper bound of the outcome.
y_lower (float, defaults=None) – The lower bound of the outcome.
assump (str, optional, default='no-assump') –
Options for the returned bounds. Should be one of
1. no-assump: calculate the no assumption bound whose result will always contain 0.
2. non-negative: The treatment is always positive.
3. non-positive: The treatment is always negative.
4. optimal: The treatment is taken if its effect is positive.

Returns

The first element is the lower bound while the second element is the upper bound. Note that if covariate is provided, all elements are ndarrays of shapes (n, ) indicating the lower and upper bounds of corresponding examples where n is the number of examples.

Return type

tuple

Raises

Exception – Raise Exception if the model is not fitted or if the assump is not given correctly.

comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters

x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –

Returns

The transformed one-hot vectors.

Return type

numpy.ndarray