Approxmation Bound for Causal Effects
Many estimator models require the unconfoundedness condition which is usually untestable. One applicable approach is to build the upper and lower bounds of our causal effects before diving into specifical estimations.
There are four different bounds in YLearn. We briefly introduce them as follows. One can see [Neal2020] for details.
No-Assumptions Bound
Suppose that
then we have
where \(\pi\) is the probabiity of taking \(X=1\).
Nonnegative Monotone Treatment Response Bound
Suppose that
which means that the treatment can only help. Then we have the following bound:
Nonpositive Monotone Treatment Response Bound
Suppose that
which means that the treatment can never help. Then we have the following bound:
Optimal Treatment Selection Bound
Suppose that
which means that people always receive the treatment if it is the best for them. Then we have the following bound:
There are one more optimal treatment selection bound:
Example
import numpy as np
from ylearn.estimator_model.approximation_bound import ApproxBound
from ylearn.exp_dataset.exp_data import meaningless_discrete_dataset_
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
data = meaningless_discrete_dataset_(num=num, confounder_n=3, treatment_effct=[2, 5, -8], random_seed=0)
treatment = 'treatment'
w = ['w_0', 'w_1', 'w_2']
outcome = 'outcome'
bound = ApproxBound(y_model=RandomForestRegressor(), x_model=RandomForestClassifier())
bound.fit(data=data, treatment=treatment, outcome=outcome, covariate=w,)
>>> ApproxBound(y_model=RandomForestRegressor(), x_prob=array([[0. , 0.99, 0.01],
[0. , 0.99, 0.01],
[1. , 0. , 0. ],
...,
[0. , 1. , 0. ],
[0.01, 0.99, 0. ],
[0.01, 0.99, 0. ]]), x_model=RandomForestClassifier())
b_l, b_u = bound1.estimate()
b_l.mean()
>>> -7.126728994957785
b_u.mean()
>>> 8.994011617037696
Class Structures
- class ylearn.estimator_model.approximation_bound.ApproxBound(y_model, x_prob=None, x_model=None, random_state=2022, is_discrete_treatment=True, categories='auto')
A model used for estimating the upper and lower bounds of the causal effects.
- Parameters
y_model (estimator, optional) – Any valid y_model should implement the fit() and predict() methods
x_prob (ndarray of shape (c, ), optional, default=None) – An array of probabilities assigning to the corresponding values of x where c is the number of different treatment classes. All elements in the array are positive and sumed to 1. For example, x_prob = array([0.5, 0.5]) means both x = 0 and x = 1 take probability 0.5. Please set this as None if you are using multiple treatments.
x_model (estimator, optional, default=None) – Models for predicting the probabilities of treatment. Any valid x_model should implement the fit() and predict_proba() methods.
random_state (int, optional, default=2022) –
is_discrete_treatment (bool, optional, default=True) – True if the treatment is discrete.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, covariate=None, is_discrete_covariate=False, **kwargs)
Fit x_model and y_model.
- Parameters
data (pandas.DataFrame) – Training data.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
covariate (list of str, optional, default=None) – Names of the covariate.
is_discrete_covariate (bool, optional, default=False) –
- Returns
The fitted instance of ApproxBound.
- Return type
instance of ApproxBound
- Raises
ValueError – Raise error when the treatment is not discrete.
- estimate(data=None, treat=None, control=None, y_upper=None, y_lower=None, assump=None)
Estimate the approximation bound of the causal effect of the treatment on the outcome.
- Parameters
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
treat (ndarray of str, optional, default=None) – Values of the treatment group. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (ndarray of str, optional, default=None) – Values of the control group.
y_upper (float, defaults=None) – The upper bound of the outcome.
y_lower (float, defaults=None) – The lower bound of the outcome.
assump (str, optional, default='no-assump') –
Options for the returned bounds. Should be one of
no-assump: calculate the no assumption bound whose result will always contain 0.
non-negative: The treatment is always positive.
non-positive: The treatment is always negative.
optimal: The treatment is taken if its effect is positive.
- Returns
The first element is the lower bound while the second element is the upper bound. Note that if covariate is provided, all elements are ndarrays of shapes (n, ) indicating the lower and upper bounds of corresponding examples where n is the number of examples.
- Return type
tuple
- Raises
Exception – Raise Exception if the model is not fitted or if the
assumpis not given correctly.
- comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns
The transformed one-hot vectors.
- Return type
numpy.ndarray