Causal Model
CausalModel
is a core object for performing Identification and finding
Instrumental Variables.
Before introducing the causal model, we should clarify the definition of interventions first. Interventions would be to take the whole population and give every one some operation. [Pearl] defined the \(do\)-operator to describe such operations. Probabilistic models can not serve to predict the effect of interventions which leads to the need for causal model.
The formal definition of causal model is due to [Pearl]. A causal model is a triple
where
\(U\) are exogenous (variables that are determined by factors outside the model);
\(V\) are endogenous that are determined by \(U \cup V\), and \(F\) is a set of functions such that
with \(pa_i \subset V \backslash V_i\).
For example, \(M = \left< U, V, F\right>\) is a causal model where
such that
Note that every causal model can be associated with a DAG and encodes necessary information of the causal relationships between variables.
YLearn uses CausalModel
to represent a causal model and support many operations related to the causal
model such as Identification.
Identification
To characterize the effect of the intervention, one needs to consider the causal effect which is a
causal estimand including the \(do\)-operator. The action that converts the causal effect into corresponding
statistical estimands is called Identification and is implemented in CausalModel
in YLearn. Note that not
all causal effects can be converted to statistical estimands. We refer to such causal effects as not identifiable. We list several identification methods supported by CausalModel.
Backdoor adjustment
The causal effect of \(X\) on \(Y\) is given by
if the set of variables \(W\) satisfies the back-door criterion relative to \((X, Y)\).
Frontdoor adjustment
The causal effect of \(X\) on \(Y\) is given by
if the set of variables \(W\) satisfies the front-door criterion relative to \((X, Y)\) and if \(P(x, w) > 0\).
General identification
[Shpitser2006] gives a necessary and sufficient graphical condition such that the causal effect of an arbitrary set of variables on another arbitrary set can be identified uniquely whenever its identifiable. We call the corresponding action of verifying this condition as general identification.
Finding Instrumental Variables
Instrumental variables are useful to identify and estimate the causal effect of \(X\) on \(Y\) when there are unobserved confoundings of \(X\) and \(Y\). A set of variables \(Z\) is said to be a set of instrumental variables if for any \(z\) in \(Z\):
\(z\) has a causal effect on \(X\).
The causal effect of \(z\) on \(Y\) is fully mediated by \(X\).
There are no back-door paths from \(z\) to \(Y\).
Example 1: Identify the causal effect with the general identification method
For the causal structure in the figure, we want to identify the causal effect of \(X\) on \(Y\) using the general identification method. The first
step is to represent the causal structure with CausalModel
.
from ylearn.causal_model.graph import CausalGraph
causation = {
'X': ['Z2'],
'Z1': ['X', 'Z2'],
'Y': ['Z1', 'Z3'],
'Z3': ['Z2'],
'Z2': [],
}
arcs = [('X', 'Z2'), ('X', 'Z3'), ('X', 'Y'), ('Z2', 'Y')]
cg = CausalGraph(causation=causation, latent_confounding_arcs=arcs)
Then we need to define an instance of CausalModel
for the causal structure encoded in cg
to perform the identification.
from ylearn.causal_model.model import CausalModel
cm = CausalModel(causal_model=cg)
stat_estimand = cm.id(y={'Y'}, x={'X'})
stat_estimand.show_latex_expression()
>>> :math:`\sum_{Z3, Z1, Z2}[P(Z2)P(Y|Z3, Z2)][P(Z1|Z2, X)][P(Z3|Z2)]`
The result is the desired identified causal effect of \(X\) on \(Y\) in the given causal structure.
Example 2: Identify the causal effect with the back-door adjustment
For the causal structure in the figure, we want to identify the causal effect of \(X\) on \(Y\) using the back-door adjustment method.
from ylearn.causal_model.graph import CausalGraph
from ylearn.causal_model.model import CausalModel
causation = {
'X1': [],
'X2': [],
'X3': ['X1'],
'X4': ['X1', 'X2'],
'X5': ['X2'],
'X6': ['X'],
'X': ['X3', 'X4'],
'Y': ['X6', 'X4', 'X5', 'X'],
}
cg = CausalGraph(causation=causation)
cm = CausalModel(causal_graph=cg)
backdoor_set, prob = cm3.identify(treatment={'X'}, outcome={'Y'}, identify_method=('backdoor', 'simple'))['backdoor']
print(backdoor_set)
>>> ['X3', 'X4']
Example 3: Find the valid instrumental variables
We want to find the valid instrumental variables for the causal effect of \(t\) on \(g\).
causation = {
'p':[],
't': ['p'],
'l': ['p'],
'g': ['t', 'l']
}
arc = [('t', 'g')]
cg = CausalGraph(causation=causation, latent_confounding_arcs=arc)
cm = CausalModel(causal_graph=cg)
cm.get_iv('t', 'g')
>>> No valid instrument variable has been found.
We still want to find the valid instrumental variables for the causal effect of \(t\) on \(g\) in this new causal structure.
causation = {
'p':[],
't': ['p', 'l'],
'l': [],
'g': ['t', 'l']
}
arc = [('t', 'g')]
cg = CausalGraph(causation=causation, latent_confounding_arcs=arc)
cm = CausalModel(causal_graph=cg)
cm.get_iv('t', 'g')
>>> {'p'}
Class Structures
- class ylearn.causal_model.CausalModel(causal_graph=None, data=None)
- Parameters
causal_graph (CausalGraph, optional, default=None) – An instance of CausalGraph which encodes the causal structures.
data (pandas.DataFrame, optional, default=None) – The data used to discover the causal structures if causal_graph is not provided.
- id(y, x, prob=None, graph=None)
Identify the causal quantity \(P(y|do(x))\) if identifiable else return raise
IdentificationError
. Note that here we only consider semi-Markovian causal model, where each unobserved variable is a parent of exactly two nodes. This is because any causal model with unobserved variables can be converted to a semi-Markovian causal model encoding the same set of conditional independences.- Parameters
y (set of str) – Set of names of outcomes.
x (set of str) – Set of names of treatments.
prob (Prob, optional, default=None) – Probability distribution encoded in the graph.
graph (CausalGraph) – CausalGraph encodes the information of corresponding causal structures.
- Returns
The probability distribution of the converted casual effect.
- Return type
- Raises
IdentificationError – If the interested causal effect is not identifiable, then raise IdentificationError.
- is_valid_backdoor_set(set_, treatment, outcome)
Determine if a given set is a valid backdoor adjustment set for causal effect of treatments on the outcomes.
- Parameters
set (set) – The adjustment set.
treatment (set or list of str) – Names of the treatment. str is also acceptable for single treatment.
outcome (set or list of str) – Names of the outcome. str is also acceptable for single outcome.
- Returns
True if the given set is a valid backdoor adjustment set for the causal effect of treatment on outcome in the current causal graph.
- Return type
bool
- get_backdoor_set(treatment, outcome, adjust='simple', print_info=False)
Return the backdoor adjustment set for the given treatment and outcome.
- Parameters
treatment (set or list of str) – Names of the treatment. str is also acceptable for single treatment.
outcome (set or list of str) – Names of the outcome. str is also acceptable for single outcome.
adjust (str) –
Set style of the backdoor set. Available options are
simple: directly return the parent set of treatment
minimal: return the minimal backdoor adjustment set
all: return all valid backdoor adjustment set.
print_info (bool, default=False) – If True, print the identified results.
- Returns
The first element is the adjustment list, while the second is the encoded Prob.
- Return type
tuple of two element
- Raises
IdentificationError – Raise error if the style is not in simple, minimal or all or no set can satisfy the backdoor criterion.
- get_backdoor_path(treatment, outcome)
Return all backdoor paths connecting treatment and outcome.
- Parameters
treatment (str) – Name of the treatment.
outcome (str) – Name of the outcome
- Returns
A list containing all valid backdoor paths between the treatment and outcome in the graph.
- Return type
list
- has_collider(path, backdoor_path=True)
If the path in the current graph has a collider, return True, else return False.
- Parameters
path (list of str) – A list containing nodes in the path.
backdoor_path (bool, default=True) – Whether the path is a backdoor path.
- Returns
True if the path has a collider.
- Return type
bool
- is_connected_backdoor_path(path)
Test whether a backdoor path is connected.
- Parameters
path (list of str) – A list describing the path.
- Returns
True if path is a d-connected backdoor path and False otherwise.
- Return type
bool
- is_frontdoor_set(set_, treatment, outcome)
Determine if the given set is a valid frontdoor adjustment set for the causal effect of treatment on outcome.
- Parameters
set (set) – The set waited to be determined as a valid front-door adjustment set.
treatment (str) – Name of the treatment.
outcome (str) – Name of the outcome.
- Returns
True if the given set is a valid frontdoor adjustment set for causal effects of treatments on outcomes.
- Return type
bool
- get_frontdoor_set(treatment, outcome, adjust='simple')
Return the frontdoor set for adjusting the causal effect between treatment and outcome.
- Parameters
treatment (set of str or str) – Name of the treatment. Should contain only one element.
outcome (set of str or str) – Name of the outcome. Should contain only one element.
adjust (str, default='simple') –
Available options include ‘simple’: Return the frontdoor set with minimal number of elements.
’minimal’: Return the frontdoor set with minimal number of elements.
’all’: Return all possible frontdoor sets.
- Returns
2 elements (adjustment_set, Prob)
- Return type
tuple
- Raises
IdentificationError – Raise error if the style is not in simple, minimal or all or no set can satisfy the frontdoor criterion.
- get_iv(treatment, outcome)
Find the instrumental variables for the causal effect of the treatment on the outcome.
- Parameters
treatment (iterable) – Name(s) of the treatment.
outcome (iterable) – Name(s) of the outcome.
- Returns
A valid instrumental variable set that will be an empty one if there is no such set.
- Return type
set
- is_valid_iv(treatment, outcome, set_)
Determine whether a given set is a valid instrumental variable set.
- Parameters
treatment (iterable) – Name(s) of the treatment.
outcome (iterable) – Name(s) of the outcome.
set (set) – The set waited to be tested.
- Returns
True if the set is a valid instrumental variable set and False otherwise.
- Return type
bool
- identify(treatment, outcome, identify_method='auto')
Identify the causal effect expression. Identification is an operation that converts any causal effect quantity, e.g., quantities with the do operator, into the corresponding statistical quantity such that it is then possible to estimate the causal effect in some given data. However, note that not all causal quantities are identifiable, in which case an IdentificationError will be raised.
- Parameters
treatment (set or list of str) – Set of names of treatments.
outcome (set or list of str) – Set of names of outcomes.
identify_method (tuple of str or str, optional, default='auto') –
If the passed value is a tuple or list, then it should have two elements where the first one is for the identification methods and the second is for the returned set style.
Available options:
’auto’ : Perform identification with all possible methods
’general’: The general identification method, see id()
(‘backdoor’, ‘simple’): Return the set of all direct confounders of both treatments and outcomes as a backdoor adjustment set.
(‘backdoor’, ‘minimal’): Return all possible backdoor adjustment sets with minimal number of elements.
(‘backdoor’, ‘all’): Return all possible backdoor adjustment sets.
(‘frontdoor’, ‘simple’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘minimal’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘all’): Return all possible frontdoor adjustment sets.
- Returns
A python dict where keys of the dict are identify methods while the values are the corresponding results.
- Return type
dict
- Raises
IdentificationError – If the causal effect is not identifiable or if the identify_method was not given properly.
- estimate(estimator_model, data=None, *, treatment=None, outcome=None, adjustment=None, covariate=None, quantity=None, **kwargs)
Estimate the identified causal effect in a new dataset.
- Parameters
estimator_model (EstimatorModel) – Any suitable estimator models implemented in the EstimatorModel can be applied here.
data (pandas.DataFrame, optional, default=None) – The data set for causal effect to be estimated. If None, use the data which is used for discovering causal graph.
treatment (set or list, optional, default=None) – Names of the treatment. If None, the treatment used for backdoor adjustment will be taken as the treatment.
outcome (set or list, optional, default=None) – Names of the outcome. If None, the outcome used for backdoor adjustment will be taken as the outcome.
adjustment (set or list, optional, default=None) – Names of the adjustment set. If None, the adjustment set is given by the simplest backdoor set found by CausalModel.
covariate (set or list, optional, default=None) – Names of covariate set. Ignored if set as None.
quantity (str, optional, default=None) – The interested quantity when evaluating causal effects.
- Returns
The estimated causal effect in data.
- Return type
np.ndarray or float
- identify_estimate(data, outcome, treatment, estimator_model=None, quantity=None, identify_method='auto', **kwargs)
Combination of the identify method and the estimate method. However, since current implemented estimator models assume (conditionally) unconfoundness automatically (except for methods related to iv), we may only consider using backdoor set adjustment to fulfill the unconfoundness condition.
- Parameters
treatment (set or list of str, optional) – Set of names of treatments.
outcome (set or list of str, optional) – Set of names of outcome.
identify_method (tuple of str or str, optional, default='auto') –
If the passed value is a tuple or list, then it should have two elements where the first one is for the identification methods and the second is for the returned set style.
Available options:
’auto’ : Perform identification with all possible methods
’general’: The general identification method, see id()
(‘backdoor’, ‘simple’): Return the set of all direct confounders of both treatments and outcomes as a backdoor adjustment set.
(‘backdoor’, ‘minimal’): Return all possible backdoor adjustment sets with minimal number of elements.
(‘backdoor’, ‘all’): Return all possible backdoor adjustment sets.
(‘frontdoor’, ‘simple’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘minimal’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘all’): Return all possible frontdoor adjustment sets.
quantity (str, optional, default=None) – The interested quantity when evaluating causal effects.
- Returns
The estimated causal effect in data.
- Return type
np.ndarray or float