Approxmation Bound for Causal Effects

Many estimator models require the unconfoundedness condition which is usually untestable. One applicable approach is to build the upper and lower bounds of our causal effects before diving into specifical estimations.

There are four different bounds in YLearn. We briefly introduce them as follows. One can see [Neal2020] for details.

Example

import numpy as np

from ylearn.estimator_model.approximation_bound import ApproxBound
from ylearn.exp_dataset.exp_data import meaningless_discrete_dataset_
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

data = meaningless_discrete_dataset_(num=num, confounder_n=3, treatment_effct=[2, 5, -8], random_seed=0)
treatment = 'treatment'
w = ['w_0', 'w_1', 'w_2']
outcome = 'outcome'

bound = ApproxBound(y_model=RandomForestRegressor(), x_model=RandomForestClassifier())
bound.fit(data=data, treatment=treatment, outcome=outcome, covariate=w,)

>>> ApproxBound(y_model=RandomForestRegressor(), x_prob=array([[0.  , 0.99, 0.01],
            [0.  , 0.99, 0.01],
            [1.  , 0.  , 0.  ],
            ...,
            [0.  , 1.  , 0.  ],
            [0.01, 0.99, 0.  ],
            [0.01, 0.99, 0.  ]]), x_model=RandomForestClassifier())

b_l, b_u = bound1.estimate()
b_l.mean()

>>> -7.126728994957785

b_u.mean()

>>> 8.994011617037696

Class Structures

class ylearn.estimator_model.approximation_bound.ApproxBound(y_model, x_prob=None, x_model=None, random_state=2022, is_discrete_treatment=True, categories='auto')

A model used for estimating the upper and lower bounds of the causal effects.

Parameters:

y_model (estimator, optional) – Any valid y_model should implement the fit() and predict() methods
x_prob (ndarray of shape (c, ), optional, default=None) – An array of probabilities assigning to the corresponding values of x where c is the number of different treatment classes. All elements in the array are positive and sumed to 1. For example, x_prob = array([0.5, 0.5]) means both x = 0 and x = 1 take probability 0.5. Please set this as None if you are using multiple treatments.
x_model (estimator, optional, default=None) – Models for predicting the probabilities of treatment. Any valid x_model should implement the fit() and predict_proba() methods.
random_state (int, optional, default=2022) –
is_discrete_treatment (bool, optional, default=True) – True if the treatment is discrete.
categories (str, optional, default='auto') –

fit(data, outcome, treatment, covariate=None, is_discrete_covariate=False, **kwargs)

Fit x_model and y_model.

Parameters:

data (pandas.DataFrame) – Training data.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
covariate (list of str, optional, default=None) – Names of the covariate.
is_discrete_covariate (bool, optional, default=False) –

Returns:

The fitted instance of ApproxBound.

Return type:

instance of ApproxBound

Raises:

ValueError – Raise error when the treatment is not discrete.

estimate(data=None, treat=None, control=None, y_upper=None, y_lower=None, assump=None)

Estimate the approximation bound of the causal effect of the treatment on the outcome.

Parameters:

data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
treat (ndarray of str, optional, default=None) – Values of the treatment group. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (ndarray of str, optional, default=None) – Values of the control group.
y_upper (float, defaults=None) – The upper bound of the outcome.
y_lower (float, defaults=None) – The lower bound of the outcome.
assump (str, optional, default='no-assump') –
Options for the returned bounds. Should be one of
1. no-assump: calculate the no assumption bound whose result will always contain 0.
2. non-negative: The treatment is always positive.
3. non-positive: The treatment is always negative.
4. optimal: The treatment is taken if its effect is positive.

Returns:

The first element is the lower bound while the second element is the upper bound. Note that if covariate is provided, all elements are ndarrays of shapes (n, ) indicating the lower and upper bounds of corresponding examples where n is the number of examples.

Return type:

tuple

Raises:

Exception – Raise Exception if the model is not fitted or if the assump is not given correctly.

comp_transormer(x, categories='auto')

Transform the discrete treatment into one-hot vectors properly.

Parameters:

x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –

Returns:

The transformed one-hot vectors.

Return type:

numpy.ndarray