Welcome to YLearn’s documentation!
YLearn, a pun of “learn why”, is a python package for causal learning which supports various aspects of causal inference ranging from causal effect identification, estimation, and causal graph discovery, etc.
User Guide
Overview of YLearn and Causal Inference
Machine learning has made great achievements in recent years. The areas in which machine learning succeeds are mainly for prediction, e.g., the classification of pictures of cats and dogs. However, machine learning is incapable of answering some questions that naturally arise in many scenarios. One example is for the counterfactual questions in policy evaluations: what would have happened if the policy had changed? Due to the fact that these counterfactuals can not be observed, machine learning models, the prediction tools, can not be used. These incapabilities of machine learning partly give rise to applications of causal inference in these days.
Causal inference directly models the outcome of interventions and formalizes the counterfactual reasoning. With the aid of machine learning, causal inference can draw causal conclusions from observational data in various manners nowadays, rather than relying on conducting craftly designed experiments.
A typical complete causal inference procedure is composed of three parts. First, it learns causal relationships using the technique called causal discovery. These relationships are then expressed either in the form of Structural Causal Models or Directed Acyclic Graphs (DAG). Second, it expresses the causal estimands, which are clarified by the interested causal questions such as the average treatment effects, in terms of the observed data. This process is known as identification. Finally, once the causal estimand is identified, causal inference proceeds to focus on estimating the causal estimand from observational data. Then policy evaluation problems and counterfactual questions can also be answered.
YLearn, equipped with many techniques developed in recent literatures, is implemented to support the whole causal inference pipeline from causal discovery to causal estimand estimation with the help of machine learning. This is more promising especially when there are abundant observational data.
Quick Start
In this part, we first show several simple example usages of YLearn. These examples cover the most common functionalities. Then we present a case study with Why
to unveil the hidden
causal relations in data.
Example usages
We present several necessary example usages of YLearn in this section, which covers defining a causal graph, identifying the causal effect, and training an estimator model, etc. Please see their specific documentations to for more details.
Representation of causal graph
Given a set of variables, the representation of its causal graph in YLearn requires a python
dict
to denote the causal relations of variables, in which the keys of thedict
are children of all elements in the corresponding values which usually should be a list of names of variables. For an instance, in the simplest case, for a given causal graph \(X \leftarrow W \rightarrow Y\), we first define a pythondict
for the causal relations, which will then be passed toCausalGraph
as a parameter:causation = {'X': ['W'], 'W':[], 'Y':['W']} cg = CausalGraph(causation=causation)
cg
will be the causal graph encoding the causal relation \(X \leftarrow W \rightarrow Y\) in YLearn. If there exist unobserved confounders in the causal graph, then, aside from the observed variables, we should also define a pythonlist
containing these causal relations. See Causal Graph for more details.Identification of causal effect
It is crucial to identify the causal effect when we want to estimate it from data. The first step for identifying the causal effect is identifying the causal estimand. This can be easily done in YLearn. For an instance, suppose that we are interested in identifying the causal estimand \(P(Y|do(X=x))\) in the causal graph cg, then we should first define an instance of
CausalModel
and call theidentify()
method:cm = CausalModel(causal_graph=cg) cm.identify(treatment={'X'}, outcome={'Y'}, identify_method=('backdoor', 'simple'))
where we use the backdoor-adjustment method here. YLearn also support front-door adjustment, finding instrumental variables, and, most importantly, the general identification method developed in [Pearl] which is able to identify any causal effect if it is identifiable.
Estimation of causal effect
The estimation of causal effects in YLearn is also fairly easy. It follows the common approach of deploying a machine learning model since YLearn focuses on the intersection of machine learning and causal inference in this part. Given a dataset, one can apply any
EstimatorModel
in YLearn with a procedure composed of 3 steps:Given data in the form of
pandas.DataFrame
, find the names of treatment, outcome, adjustment, covariate.Call
fit()
method ofEstimatorModel
to train the model.Call
estimate()
method ofEstimatorModel
to estimate causal effects in test data.
See Estimator Model: Estimating the Causal Effects for more details.
Using the all-in-one API: Why
For the purpose of applying YLearn in a unified and eaiser manner, YLearn provides the API
Why
.Why
is an API which encapsulates almost everything in YLearn, such as identifying causal effects and scoring a trained estimator model. To useWhy
, one should first create an instance ofWhy
which needs to be trained by calling its methodfit()
, after which other utilities, such ascausal_effect()
,score()
, andwhatif()
, could be used. This procedure is illustrated in the following code example:from sklearn.datasets import fetch_california_housing from ylearn import Why housing = fetch_california_housing(as_frame=True) data = housing.frame outcome = housing.target_names[0] data[outcome] = housing.target why = Why() why.fit(data, outcome, treatment=['AveBedrms', 'AveRooms']) print(why.causal_effect())
API: Interacting with YLearn
Class Name |
Description |
|
An API which encapsulates almost everything in YLearn, such as identifying causal effects and scoring a trained estimator model. It provides to users a simple and efficient way to use YLearn. |
Class Name |
Description |
|
Find causal structures in observational data. |
Class Name |
Description |
|
Express the causal structures and support other operations related to causal graph, e.g., add and delete edges to the graph. |
|
Encode causations represented by the |
|
Represent the probability distribution. |
Class Name |
Description |
|
A highly flexible nonparametric estimator (Generalized Random Forest, GRF) model which supports both discrete and continuous treatment. The unconfoundedness condition is required. |
|
A generalized random forest combined with the local centering technique (i.e. double machine learning framework). The unconfoundedness condition is required. |
|
A causal forest as an ensemble of a bunch of |
|
A model used for estimating the upper and lower bounds of the causal effects. This model does not need the unconfoundedness condition. |
|
A class for estimating causal effect with decision tree. The unconfoundedness condition is required. |
|
Instrumental variables with deep neural networks. Must provide the names of instrumental variables. |
|
Nonparametric instrumental variables. Must provide the names of instrumental variables. |
|
Double machine learning model for the estimation of CATE. The unconfoundedness condition is required. |
|
Doubly robust method for the estimation of CATE. The permuted version considers all possible treatment-control pairs. The unconfoundedness condition is required and the treatment must be discrete. |
|
SLearner. The permuted version considers all possible treatment-control pairs. The unconfoundedness condition is required and the treatment must be discrete. |
|
TLearner with multiple machine learning models. The permuted version considers all possible treatment-control pairs. The unconfoundedness condition is required and the treatment must be discrete. |
|
XLearner with multiple machine learning models. The permuted version considers all possible treatment-control pairs. The unconfoundedness condition is required and the treatment must be discrete. |
|
Effect score for measuring the performances of estimator models. The unconfoundedness condition is required. |
Class Name |
Description |
|
A class for finding the optimal policy for maximizing the causal effect with the tree model. |
Class Name |
Description |
|
An object used to interpret the estimated CATE using the decision tree model. |
|
An object used to interpret the policy given by some |
Causal Model: The Representation of Causal Structures
Causal Graph
This is a class for representing DAGs of causal structures.
Generally, for a set of variables \(V\), a variable \(V_i\) is said to be a cause of a variable \(V_j\) if \(V_j\) can change in response to changes in \(V_i\). In a DAG for causal structures, every parent is a direct causes of all its children. And we refer to these DAGs for causal structures as causal graphs. For the terminologies of graph, one can see, for example, Chapter 1.2 in [Pearl].
There are five basic structures composed of two or three nodes for building causal graphs. Besides the structures, there are flows of association and causation in causal graphs in the probability language. Any two nodes \(X\) and \(Y\) connected by the flow of association implies that they are statistically dependent, i.e., \(P(X, Y) \neq P(X)P(Y)\). Let \(X, Y\) and \(W\) be three distinct nodes, then the five basics structures include:
chains:
\(X\) and \(Y\) are statistically dependent;
forks:
\(X\) and \(Y\) are statistically dependent;
colliders:
\(X\) and \(Y\) are statistically independent;
two unconnected nodes:
\(X\) and \(Y\) are statistically independent;
two connected nodes:
\(X\) and \(Y\) are statistically dependent.
In YLearn, one can use the CausalGraph
to represent causal structures by first giving a python dict where
each key in this dict is a child of all elements in the corresponding dict value, which usually should be a list
of str.
Class Structures
- class ylearn.causal_model.graph.CausalGraph(causation, dag=None, latent_confounding_arcs=None)
- Parameters:
causation (dict) – Descriptions of the causal structures where values are parents of the corresponding keys.
dag (networkx.MultiGraph, optional, default=None) – A known graph structure. If provided, dag must represent the causal structures stored in causation.
latent_confounding_arcs (set or list of tuple of two str, optional, default=None,) – Two elements in the tuple are names of nodes in the graph where there exists an latent confounding arcs between them. Semi-Markovian graphs with unobserved confounders can be converted to a graph without unobserved variables, where one can add bi-directed latent confounding arcs to represent these relations. For example, the causal graph X <- U -> Y, where U is an unobserved confounder of X and Y, can be converted equivalently to X <–>Y where <–> is a latent confounding arc.
- ancestors(x)
Return the ancestors of all nodes in x.
- Parameters:
x (set of str) – A set of nodes in the graph.
- Returns:
Ancestors of nodes in x in the graph.
- Return type:
set of str
- descendents(x)
Return the descendents of all nodes in x.
- Parameters:
x (set of str) – A set of nodes in the graph.
- Returns:
Descendents of nodes in x in the graph.
- Return type:
set of str
- parents(x, only_observed=True)
Return the direct parents of the node x in the graph.
- Parameters:
x (str) – Name of the node x.
only_observed (bool, default=True) – If True, then only find the observed parents in the causal graph, otherwise also include the unobserved variables, by default True
- Returns:
Parents of the node x in the graph
- Return type:
list
- add_nodes(nodes, new=False)
If not new, add all nodes in the nodes to the current CausalGraph, else create a new graph and add nodes.
- Parameters:
x (set or list) – Nodes waited to be added to the current causal graph.
new (bool, default=False) – If new create and return a new graph. Defaults to False.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- add_edges_from(edge_list, new=False, observed=True)
Add edges to the causal graph.
- Parameters:
edge_list (list) – Every element of the list contains two elements, the first for the parent
new (bool, default=False) – If new create and return a new graph. Defaults to False.
observed (bool, default=True) – Add unobserved bidirected confounding arcs if not observed.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- add_edge(edge_list, s, t, observed=True)
Add edges to the causal graph.
- Parameters:
s (str) – Source of the edge.
t (str) – Target of the edge.
observed (bool, default=True) – Add unobserved bidirected confounding arcs if not observed.
- remove_nodes(nodes, new=True)
Remove all nodes of nodes in the graph.
- Parameters:
nodes (set or list) – Nodes waited to be removed.
new (bool, default=True) – If True, create a new graph, remove nodes in that graph and return it. Defaults to False.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- remove_edge(edge, observed=True)
Remove the edge in the CausalGraph. If not observed, remove the unobserved latent confounding arcs.
- Parameters:
edge (tuple) – 2 elements denote the start and end of the edge, respectively.
observed (bool, default=True) – If not observed, remove the unobserved latent confounding arcs.
- remove_edges_from(edge_list, new=False, observed=True)
Remove all edges in the edge_list in the graph.
- Parameters:
edge_list (list) – list of edges to be removed.
new (bool, default=False) – If new, create a new CausalGraph and remove edges.
observed (bool, default=True) – Remove unobserved latent confounding arcs if not observed.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- build_sub_graph(subset)
Return a new CausalGraph as the subgraph of the graph with nodes in the subset.
- Parameters:
subset (set) – The set of the subgraph.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- remove_incoming_edges(x, new=False)
Remove incoming edges of all nodes of x. If new, do this in the new CausalGraph.
- Parameters:
x (set or list) –
new (bool, default=False,) – Return a new graph if set as Ture.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- remove_outgoing_edges(x, new=False)
Remove outgoing edges of all nodes of x. If new, do this in the new CausalGraph.
- Parameters:
x (set or list) –
new (bool, default=False,) – Return a new graph if set as Ture.
- Returns:
Modified causal graph
- Return type:
instance of CausalGraph
- property c_components
The C-components set of the graph.
- Returns:
The C-components set of the graph.
- Return type:
set of str
- property observed_dag
Return the observed part of the graph, including observed nodes and edges between them.
- Returns:
The observed part of the graph
- Return type:
networkx.MultiGraph
- property explicit_unob_var_dag
Build a new dag where all unobserved confounding arcs are replaced by explicit unobserved variables.
- Returns:
Dag with explicit unobserved nodes
- Return type:
networkx.MultiGraph
- property topo_order
Return the topological order of the nodes in the observed graph.
- Returns:
Nodes in the topological order
- Return type:
generator
Causal Model
CausalModel
is a core object for performing Identification and finding
Instrumental Variables.
Before introducing the causal model, we should clarify the definition of interventions first. Interventions would be to take the whole population and give every one some operation. [Pearl] defined the \(do\)-operator to describe such operations. Probabilistic models can not serve to predict the effect of interventions which leads to the need for causal model.
The formal definition of causal model is due to [Pearl]. A causal model is a triple
where
\(U\) are exogenous (variables that are determined by factors outside the model);
\(V\) are endogenous that are determined by \(U \cup V\), and \(F\) is a set of functions such that
with \(pa_i \subset V \backslash V_i\).
For example, \(M = \left< U, V, F\right>\) is a causal model where
such that
Note that every causal model can be associated with a DAG and encodes necessary information of the causal relationships between variables.
YLearn uses CausalModel
to represent a causal model and support many operations related to the causal
model such as Identification.
Identification
To characterize the effect of the intervention, one needs to consider the causal effect which is a
causal estimand including the \(do\)-operator. The action that converts the causal effect into corresponding
statistical estimands is called Identification and is implemented in CausalModel
in YLearn. Note that not
all causal effects can be converted to statistical estimands. We refer to such causal effects as not identifiable. We list several identification methods supported by CausalModel.
Class Structures
- class ylearn.causal_model.CausalModel(causal_graph=None, data=None)
- Parameters:
causal_graph (CausalGraph, optional, default=None) – An instance of CausalGraph which encodes the causal structures.
data (pandas.DataFrame, optional, default=None) – The data used to discover the causal structures if causal_graph is not provided.
- id(y, x, prob=None, graph=None)
Identify the causal quantity \(P(y|do(x))\) if identifiable else return raise
IdentificationError
. Note that here we only consider semi-Markovian causal model, where each unobserved variable is a parent of exactly two nodes. This is because any causal model with unobserved variables can be converted to a semi-Markovian causal model encoding the same set of conditional independences.- Parameters:
y (set of str) – Set of names of outcomes.
x (set of str) – Set of names of treatments.
prob (Prob, optional, default=None) – Probability distribution encoded in the graph.
graph (CausalGraph) – CausalGraph encodes the information of corresponding causal structures.
- Returns:
The probability distribution of the converted casual effect.
- Return type:
- Raises:
IdentificationError – If the interested causal effect is not identifiable, then raise IdentificationError.
- is_valid_backdoor_set(set_, treatment, outcome)
Determine if a given set is a valid backdoor adjustment set for causal effect of treatments on the outcomes.
- Parameters:
set (set) – The adjustment set.
treatment (set or list of str) – Names of the treatment. str is also acceptable for single treatment.
outcome (set or list of str) – Names of the outcome. str is also acceptable for single outcome.
- Returns:
True if the given set is a valid backdoor adjustment set for the causal effect of treatment on outcome in the current causal graph.
- Return type:
bool
- get_backdoor_set(treatment, outcome, adjust='simple', print_info=False)
Return the backdoor adjustment set for the given treatment and outcome.
- Parameters:
treatment (set or list of str) – Names of the treatment. str is also acceptable for single treatment.
outcome (set or list of str) – Names of the outcome. str is also acceptable for single outcome.
adjust (str) –
Set style of the backdoor set. Available options are
simple: directly return the parent set of treatment
minimal: return the minimal backdoor adjustment set
all: return all valid backdoor adjustment set.
print_info (bool, default=False) – If True, print the identified results.
- Returns:
The first element is the adjustment list, while the second is the encoded Prob.
- Return type:
tuple of two element
- Raises:
IdentificationError – Raise error if the style is not in simple, minimal or all or no set can satisfy the backdoor criterion.
- get_backdoor_path(treatment, outcome)
Return all backdoor paths connecting treatment and outcome.
- Parameters:
treatment (str) – Name of the treatment.
outcome (str) – Name of the outcome
- Returns:
A list containing all valid backdoor paths between the treatment and outcome in the graph.
- Return type:
list
- has_collider(path, backdoor_path=True)
If the path in the current graph has a collider, return True, else return False.
- Parameters:
path (list of str) – A list containing nodes in the path.
backdoor_path (bool, default=True) – Whether the path is a backdoor path.
- Returns:
True if the path has a collider.
- Return type:
bool
- is_connected_backdoor_path(path)
Test whether a backdoor path is connected.
- Parameters:
path (list of str) – A list describing the path.
- Returns:
True if path is a d-connected backdoor path and False otherwise.
- Return type:
bool
- is_frontdoor_set(set_, treatment, outcome)
Determine if the given set is a valid frontdoor adjustment set for the causal effect of treatment on outcome.
- Parameters:
set (set) – The set waited to be determined as a valid front-door adjustment set.
treatment (str) – Name of the treatment.
outcome (str) – Name of the outcome.
- Returns:
True if the given set is a valid frontdoor adjustment set for causal effects of treatments on outcomes.
- Return type:
bool
- get_frontdoor_set(treatment, outcome, adjust='simple')
Return the frontdoor set for adjusting the causal effect between treatment and outcome.
- Parameters:
treatment (set of str or str) – Name of the treatment. Should contain only one element.
outcome (set of str or str) – Name of the outcome. Should contain only one element.
adjust (str, default='simple') –
Available options include ‘simple’: Return the frontdoor set with minimal number of elements.
’minimal’: Return the frontdoor set with minimal number of elements.
’all’: Return all possible frontdoor sets.
- Returns:
2 elements (adjustment_set, Prob)
- Return type:
tuple
- Raises:
IdentificationError – Raise error if the style is not in simple, minimal or all or no set can satisfy the frontdoor criterion.
- get_iv(treatment, outcome)
Find the instrumental variables for the causal effect of the treatment on the outcome.
- Parameters:
treatment (iterable) – Name(s) of the treatment.
outcome (iterable) – Name(s) of the outcome.
- Returns:
A valid instrumental variable set that will be an empty one if there is no such set.
- Return type:
set
- is_valid_iv(treatment, outcome, set_)
Determine whether a given set is a valid instrumental variable set.
- Parameters:
treatment (iterable) – Name(s) of the treatment.
outcome (iterable) – Name(s) of the outcome.
set (set) – The set waited to be tested.
- Returns:
True if the set is a valid instrumental variable set and False otherwise.
- Return type:
bool
- identify(treatment, outcome, identify_method='auto')
Identify the causal effect expression. Identification is an operation that converts any causal effect quantity, e.g., quantities with the do operator, into the corresponding statistical quantity such that it is then possible to estimate the causal effect in some given data. However, note that not all causal quantities are identifiable, in which case an IdentificationError will be raised.
- Parameters:
treatment (set or list of str) – Set of names of treatments.
outcome (set or list of str) – Set of names of outcomes.
identify_method (tuple of str or str, optional, default='auto') –
If the passed value is a tuple or list, then it should have two elements where the first one is for the identification methods and the second is for the returned set style.
Available options:
’auto’ : Perform identification with all possible methods
’general’: The general identification method, see id()
(‘backdoor’, ‘simple’): Return the set of all direct confounders of both treatments and outcomes as a backdoor adjustment set.
(‘backdoor’, ‘minimal’): Return all possible backdoor adjustment sets with minimal number of elements.
(‘backdoor’, ‘all’): Return all possible backdoor adjustment sets.
(‘frontdoor’, ‘simple’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘minimal’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘all’): Return all possible frontdoor adjustment sets.
- Returns:
A python dict where keys of the dict are identify methods while the values are the corresponding results.
- Return type:
dict
- Raises:
IdentificationError – If the causal effect is not identifiable or if the identify_method was not given properly.
- estimate(estimator_model, data=None, *, treatment=None, outcome=None, adjustment=None, covariate=None, quantity=None, **kwargs)
Estimate the identified causal effect in a new dataset.
- Parameters:
estimator_model (EstimatorModel) – Any suitable estimator models implemented in the EstimatorModel can be applied here.
data (pandas.DataFrame, optional, default=None) – The data set for causal effect to be estimated. If None, use the data which is used for discovering causal graph.
treatment (set or list, optional, default=None) – Names of the treatment. If None, the treatment used for backdoor adjustment will be taken as the treatment.
outcome (set or list, optional, default=None) – Names of the outcome. If None, the outcome used for backdoor adjustment will be taken as the outcome.
adjustment (set or list, optional, default=None) – Names of the adjustment set. If None, the adjustment set is given by the simplest backdoor set found by CausalModel.
covariate (set or list, optional, default=None) – Names of covariate set. Ignored if set as None.
quantity (str, optional, default=None) – The interested quantity when evaluating causal effects.
- Returns:
The estimated causal effect in data.
- Return type:
np.ndarray or float
- identify_estimate(data, outcome, treatment, estimator_model=None, quantity=None, identify_method='auto', **kwargs)
Combination of the identify method and the estimate method. However, since current implemented estimator models assume (conditionally) unconfoundness automatically (except for methods related to iv), we may only consider using backdoor set adjustment to fulfill the unconfoundness condition.
- Parameters:
treatment (set or list of str, optional) – Set of names of treatments.
outcome (set or list of str, optional) – Set of names of outcome.
identify_method (tuple of str or str, optional, default='auto') –
If the passed value is a tuple or list, then it should have two elements where the first one is for the identification methods and the second is for the returned set style.
Available options:
’auto’ : Perform identification with all possible methods
’general’: The general identification method, see id()
(‘backdoor’, ‘simple’): Return the set of all direct confounders of both treatments and outcomes as a backdoor adjustment set.
(‘backdoor’, ‘minimal’): Return all possible backdoor adjustment sets with minimal number of elements.
(‘backdoor’, ‘all’): Return all possible backdoor adjustment sets.
(‘frontdoor’, ‘simple’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘minimal’): Return all possible frontdoor adjustment sets with minimal number of elements.
(‘frontdoor’, ‘all’): Return all possible frontdoor adjustment sets.
quantity (str, optional, default=None) – The interested quantity when evaluating causal effects.
- Returns:
The estimated causal effect in data.
- Return type:
np.ndarray or float
Representation of Probability
To represent and modifies probabilities such as
one can define an instance of Prob
and change its attributes.
- class ylearn.causal_model.prob.Prob(variables=set(), conditional=set(), divisor=set(), marginal=set(), product=set())
Probability distribution, e.g., the probability expression
\[\sum_{w}P(v|y)[P(w|z)P(x|y)P(u)].\]We will clarify below the meanings of our variables with this example.
- Parameters:
variables (set, default=set()) – The variables (\(v\) in the above example) of the probability.
conditional (set, default=set()) – The conditional set (\(y\) in the above example) of the probability.
marginal (set, default=set()) – The sum set (\(w\) in the above example) for marginalizing the probability.
product (set, default=set()) – If not set(), then the probability is composed of the first probability object \((P(v|y))\) and several other probabiity objects that are all saved in the set product, e.g., product = {P1, P2, P3} where P1 for \(P(w|z)\), P2 for \(P(x|y)\), and P3 for \(P(u)\) in the above example.
- parse()
Return the expression of the probability distribution.
- Returns:
Expression of the encoded probabiity
- Return type:
str
- show_latex_expression()
Show the latex expression.
For a set of variables \(V\), its causal structure can be represented by a directed acyclic graph (DAG), where each node corresponds to an element of \(V\) while each direct functional relationship among the corresponding variables can be represented by a link in the DAG. A causal structure guides the precise specification of how each variable is influenced by its parents in the DAG. For an instance, \(X \leftarrow W \rightarrow Y\) denotes that \(W\) is a parent, thus also a common cause, of \(X\) and \(Y\). More specifically, for two distinct variables \(V_i\) and \(V_j\), if their functional relationship is
for some function \(f\) and noise \(\eta\), then in the DAG representing the causal structure of the set of variables \(V\), there should be an arrow pointing to \(V_j\) from \(V_i\). A detailed introduction to such DAGs for causal structures can be found in [Pearl].
A causal effect, also named as causal estimand, can be expressed with the \(do\)-operator according to [Pearl]. As an example,
denotes the probability function of \(y\) after imposing the intervention \(x\). Causal structures
are crucial to expressing and estimating interested causal estimands. YLearn implements an object,
CausalGraph
, to support representations for causal structures and related operations of the
causal structures. Please see Causal Graph for details.
YLearn concerns the intersection of causal inference and machine learning. Therefore we assume that we have abundant observational data rather than having the access to design randomized experiments. Then Given a DAG for some causal structure, the causal estimands, e.g., the average treatment effects (ATEs), usually can not be directly estimated from the data due to the counterfactuals which can never be observed. Thus it is necessary to convert these causal estimands into other quantities, which can be called as statistical estimands and can be estimated from data, before proceeding to any estimation. The procedure of converting a causal estimand into the corresponding statistical estimand is called identification.
The object for supporting identification and other related operations of causal structures is CausalModel
.
More details can be found in Causal Model.
In the language of Pearl’s causal inference, it is also necessary to represent the results
in the language of probability. For this purpose, YLearn also implements an object Prob
which is introduced in
Representation of Probability.
Estimator Model: Estimating the Causal Effects
For a causal effect with \(do\)-operator, after converting it into the corresponding statistical estimand with the approach called Identification, the task of causal inference now becomes estimating the statistical estimand, the converted causal effect. Before diving into any specific estimation methods for causal effects, we briefly introduce the problem settings of the estimation of causal effects.
Problem Setting
It is introduced in Causal Model that every causal structure has a corresponding DAG called causal graph. Furthermore, each child-parent family in a DAG \(G\) represents a deterministic function
where \(pa_i\) are parents of \(x_i\) in \(G\) and \(\eta_i\) are random disturbances representing exogeneous not present in the analysis. We call these functions Structural Equation Model related to the causal structures. For a set of variables \(W\) that satisfies the back-door criterion (see Identification), the causal effect of \(X\) on \(Y\) is given by the formula
In such case, variables \(X\) for which the above equality is valid are also named “conditionally ignorable given \(W\)” in the potential outcome framework. The set of variables \(W\) satisfying this condition is called adjustment set. And in the language of structural equation model, these relations are encoded by
Our problems can be expressed with the structural equation model.
Estimator Models
YLearn implements several estimator models for the estimation of causal effects:
Approxmation Bound for Causal Effects
Many estimator models require the unconfoundedness condition which is usually untestable. One applicable approach is to build the upper and lower bounds of our causal effects before diving into specifical estimations.
There are four different bounds in YLearn. We briefly introduce them as follows. One can see [Neal2020] for details.
Class Structures
- class ylearn.estimator_model.approximation_bound.ApproxBound(y_model, x_prob=None, x_model=None, random_state=2022, is_discrete_treatment=True, categories='auto')
A model used for estimating the upper and lower bounds of the causal effects.
- Parameters:
y_model (estimator, optional) – Any valid y_model should implement the fit() and predict() methods
x_prob (ndarray of shape (c, ), optional, default=None) – An array of probabilities assigning to the corresponding values of x where c is the number of different treatment classes. All elements in the array are positive and sumed to 1. For example, x_prob = array([0.5, 0.5]) means both x = 0 and x = 1 take probability 0.5. Please set this as None if you are using multiple treatments.
x_model (estimator, optional, default=None) – Models for predicting the probabilities of treatment. Any valid x_model should implement the fit() and predict_proba() methods.
random_state (int, optional, default=2022) –
is_discrete_treatment (bool, optional, default=True) – True if the treatment is discrete.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, covariate=None, is_discrete_covariate=False, **kwargs)
Fit x_model and y_model.
- Parameters:
data (pandas.DataFrame) – Training data.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
covariate (list of str, optional, default=None) – Names of the covariate.
is_discrete_covariate (bool, optional, default=False) –
- Returns:
The fitted instance of ApproxBound.
- Return type:
instance of ApproxBound
- Raises:
ValueError – Raise error when the treatment is not discrete.
- estimate(data=None, treat=None, control=None, y_upper=None, y_lower=None, assump=None)
Estimate the approximation bound of the causal effect of the treatment on the outcome.
- Parameters:
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
treat (ndarray of str, optional, default=None) – Values of the treatment group. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (ndarray of str, optional, default=None) – Values of the control group.
y_upper (float, defaults=None) – The upper bound of the outcome.
y_lower (float, defaults=None) – The lower bound of the outcome.
assump (str, optional, default='no-assump') –
Options for the returned bounds. Should be one of
no-assump: calculate the no assumption bound whose result will always contain 0.
non-negative: The treatment is always positive.
non-positive: The treatment is always negative.
optimal: The treatment is taken if its effect is positive.
- Returns:
The first element is the lower bound while the second element is the upper bound. Note that if covariate is provided, all elements are ndarrays of shapes (n, ) indicating the lower and upper bounds of corresponding examples where n is the number of examples.
- Return type:
tuple
- Raises:
Exception – Raise Exception if the model is not fitted or if the
assump
is not given correctly.
- comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
Meta-Learner
Meta-Learners [Kunzel2019] are estimator models that aim to estimate the CATE by taking advantage of machine learning models when the treatment is discrete, e.g., the treatment has only two values 1 and 0, and when the unconfoundedness condition is satisified. Generally speaking, it employs multiple machine learning models with the flexibility on the choice of models.
YLearn implements 3 Meta-Learners: S-Learner, T-Learner, and X-Learner. We provide below several useful examples before introducing their class structures.
S-Learner
SLearner uses one machine learning model to estimate the causal effects. Specifically, we fit a model to predict outcome \(y\) from treatment \(x\) and adjustment set (or covariate) \(w\) with a machine learning model \(f\):
The causal effect \(\tau(w)\) is then calculated as
- class ylearn.estimator_model.meta_learner.SLearner(model, random_state=2022, is_discrete_treatment=True, categories='auto', *args, **kwargs)
- Parameters:
model (estimator, optional) – The base machine learning model for training SLearner. Any model should be some valid machine learning model with fit() and predict() functions.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=True) – Treatment must be discrete for SLearner.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)
Fit the SLearner in the dataset.
- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group
control (int, optional) – Label of the intended control group
combined_treatment (bool, optional, default=True) –
Only modify this parameter for multiple treatments, where multiple discrete treatments are combined to give a single new group of discrete treatment if set as True. When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
treatment_1: \(x_1 | x_1 \in \{'sleep', 'run'\}\),
treatment_2: \(x_2 | x_2 \in \{'study', 'work'\}\),
then we can convert these two binary classification tasks into a single classification task with 4 different classes:
treatment: \(x | x \in \{0, 1, 2, 3\}\),
where, for example, 1 stands for (‘sleep’ and ‘stuy’).
- Returns:
The fitted instance of SLearner.
- Return type:
instance of SLearner
- estimate(data=None, quantity=None)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
- Returns:
The estimated causal effects
- Return type:
ndarray
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- _comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
T-Learner
The problem of SLearner is that the treatment vector is only 1-dimensional while the adjustment vector could be multi-dimensional. Thus if the dimension of the adjustment is much larger than 1, then the estimated results will always be close to 0. TLearner uses two machine learning models to estimate the causal effect. Specifically, let \(w\) denote the adjustment set (or covariate), we
Fit two models \(f_t(w)\) for the treatment group (\(x=\) treat) and \(f_0(w)\) for the control group (\(x=\) control), respectively:
\[y_t = f_t(w)\]
with data where \(x=\) treat and
\[y_0 = f_0(w)\]with data where \(x=\) control.
Compute the causal effect \(\tau(w)\) as the difference between predicted results of these two models:
\[\tau(w) = f_t(w) - f_0(w).\]
- class ylearn.estimator_model.meta_learner.TLearner(model, random_state=2022, is_discrete_treatment=True, categories='auto', *args, **kwargs)
- Parameters:
model (estimator, optional) – The base machine learning model for training SLearner. Any model should be some valid machine learning model with fit() and predict() functions.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=True) – Treatment must be discrete for SLearner.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)
Fit the TLearner in the dataset.
- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group
control (int, optional) – Label of the intended control group
combined_treatment (bool, optional, default=True) –
Only modify this parameter for multiple treatments, where multiple discrete treatments are combined to give a single new group of discrete treatment if set as True. When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert the multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
treatment_1: \(x_1 | x_1 \in \{'sleep', 'run'\}\),
treatment_2: \(x_2 | x_2 \in \{'study', 'work'\}\),
then we can convert these two binary classification tasks into a single classification task with 4 different classes:
treatment: \(x | x \in \{0, 1, 2, 3\}\),
where, for example, 1 stands for (‘sleep’ and ‘stuy’).
- Returns:
The fitted instance of TLearner.
- Return type:
instance of TLearner
- estimate(data=None, quantity=None)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
- Returns:
The estimated causal effects
- Return type:
ndarray
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- _comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
X-Learner
TLearner does not use all data efficiently. This issue can can be addressed by the XLearner which utilities all data to train several models. Training a XLearner is composed of 3 steps:
As in the case of TLearner, we first train two different models for the control group and treated group, respectively:
\[\begin{split}& f_0(w) \text{for the control group}\\ & f_t(w) \text{for the treat group}.\end{split}\]Generate two new datasets \(\{(h_0, w)\}\) using the control group and \(\{(h_t, w)\}\) using the treated group where
\[\begin{split}h_0 & = f_t(w) - y_0,\\ h_t & = y_t - f_0(w).\end{split}\]Then train two new machine learing models \(k_0(w)\) and \(k_t(w)\) in these datasets such that
\[\begin{split}h_0 & = k_0(w) \\ h_t & = k_t(w).\end{split}\]Get the final model by combining the above two models:
\[g(w) = k_0(w)a(w) + k_t(w)(1 - a(w))\]where \(a(w)\) is a coefficient adjusting the weight of \(k_0\) and \(k_t\).
Finally, the casual effect \(\tau(w)\) can be estimated as follows:
- class ylearn.estimator_model.meta_learner.XLearner(model, random_state=2022, is_discrete_treatment=True, categories='auto', *args, **kwargs)
- Parameters:
model (estimator, optional) – The base machine learning model for training XLearner. Any model should be some valid machine learning model with fit() and predict() functions.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=True) – Treatment must be discrete for SLearner.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)
Fit the XLearner in the dataset.
- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group
control (int, optional) – Label of the intended control group
combined_treatment (bool, optional, default=True) –
Only modify this parameter for multiple treatments, where multiple discrete treatments are combined to give a single new group of discrete treatment if set as True. When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert the multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
treatment_1: \(x_1 | x_1 \in \{'sleep', 'run'\}\),
treatment_2: \(x_2 | x_2 \in \{'study', 'work'\}\),
then we can convert these two binary classification tasks into a single classification task with 4 different classes:
treatment: \(x | x \in \{0, 1, 2, 3\}\),
where, for example, 1 stands for (‘sleep’ and ‘stuy’).
- Returns:
The fitted instance of XLearner.
- Return type:
instance of XLearner
- estimate(data=None, quantity=None)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
- Returns:
The estimated causal effects
- Return type:
ndarray
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- _comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
Double Machine Learning
The double machine learning (DML) model [Chern2016] can be applied when all confounders of the treatment and outcome, variables that simultaneously influence the treatment and outcome, are observed. Let \(y\) be the outcome and \(x\) be the treatment, a DML model solves the following causal effect estimation (CATE estimation):
where \(F(v)\) is the CATE conditional on the condition \(v\). Furthermore, to estimate \(F(v)\), we note that
Thus by first estimating \(\mathbb{E}[y|w, v]\) and \(\mathbb{E}[x|w,v]\) as
we can get a new dataset \((\tilde{y}, \tilde{x})\) where
such that the relation between \(\tilde{y}\) and \(\tilde{x}\) is linear
which can be simply modeled by the linear regression model.
On the other hand, in the current version, \(F(v)\) takes the form
where \(H\) can be seen as a 3-rank tensor and \(\rho_k\) is a function of the covariate \(v\), e.g., \(\rho(v) = v\) in the simplest case. Therefore, the outcome \(y\) can now be represented as
In this sense, the linear regression problem between \(\tilde{y}\) and \(\tilde{x}\) now becomes
Class Structures
- class ylearn.estimator_model.double_ml.DoubleML(x_model, y_model, yx_model=None, cf_fold=1, adjustment_transformer=None, covariate_transformer=None, random_state=2022, is_discrete_treatment=False, categories='auto')
- Parameters:
x_model (estimator, optional) – Machine learning models for fitting x. Any such models should implement the
fit()
andpredict`()
(alsopredict_proba()
if x is discrete) methods.y_model (estimator, optional) – The machine learning model which is trained to modeling the outcome. Any valid y_model should implement the
fit()
andpredict()
methods.yx_model (estimator, optional) – Machine learning models for fitting the residual of y on residual of x. Only support linear regression model in the current version.
cf_fold (int, default=1) – The number of folds for performing cross fit in the first stage.
adjustment_transformer (transormer, optional, default=None,) – Transformer for adjustment variables which can be used to generate new features of adjustment variables.
covariate_transformer (transormer, optional, default=None,) – Transformer for covariate variables which can be used to generate new features of covariate variables.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=False) – If the treatment variables are discrete, set this to True.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, **kwargs)
Fit the DoubleML estimator model. Note that the training of a DML has two stages, where we implement them in
_fit_1st_stage()
and_fit_2nd_stage()
.- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
- Returns:
The fitted model
- Return type:
an instance of DoubleML
- estimate(data=None, treat=None, control=None, quantity=None)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator directly evaluate all quantities in the training data if data is None.
treat (float or numpy.ndarray, optional, default=None) – In the case of single discrete treatment, treat should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, treat should be a float or a ndarray.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
control (float or numpy.ndarray, optional, default=None) – This is similar to the cases of treat.
- Returns:
The estimated causal effects
- Return type:
ndarray
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator will use the training data if data is None.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
Doubly Robust
The doubly robust method (see [Funk2010]) estimates the causal effects when the treatment is discrete and the unconfoundness condition is satisified. Training a doubly robust model is composed of 3 steps.
Let \(k\) be an int. Form a \(K\)-fold random partition for the data \(\{(X_i, W_i, V_i, Y_i)\}_{i = 1}^n\) such that
\[\{(x_i, w_i, v_i, y_i)\}_{i = 1}^n = D_k \cup T_k\]where \(D_k\) stands for the training data while \(T_k\) stands for the test data and \(\cup_{k = 1}^K T_k = \{(X_i, W_i, V_i, Y_i)\}_{i = 1}^n\).
For each \(k\), train two models \(f(X, W, V)\) and \(g(W, V)\) on \(D_k\) to predict \(y\) and \(x\), respectively. Then evaluate their performances in \(T_k\) whoes results will be saved as \(\{(\hat{X}, \hat{Y})\}_k\). All \(\{(\hat{X}, \hat{Y})\}_k\) will be combined to give the new dataset \(\{(\hat{X}_i, \hat{Y}_i(X, W, V))\}_{i = 1}^n\).
For any given pair of treat group where \(X=x\) and control group where \(X = x_0\), we build the final dataset \(\{(V, \tilde{Y}_x - \tilde{Y}_0)\}\) where \(\tilde{Y}_x\) is defined as
\[\begin{split}\tilde{Y}_x & = \hat{Y}(X=x, W, V) + \frac{(Y - \hat{Y}(X=x, W, V)) * \mathbb{I}(X=x)}{P[X=x| W, V]} \\ \tilde{Y}_0 & = \hat{Y}(X=x_0, W, V) + \frac{(Y - \hat{Y}(X=x_0, W, V)) * \mathbb{I}(X=x_0)}{P[X=x_0| W, V]}\end{split}\]and train the final machine learning model \(h(W, V)\) on this dataset to predict the causal effect \(\tau(V)\)
\[\tau(V) = \tilde{Y}_x - \tilde{Y}_0 = h(V).\]Then we can directly estimate the causal effects by passing the covariate \(V\) to the model \(h(V)\).
Class Structures
- class ylearn.estimator_model.doubly_robust.DoublyRobust(x_model, y_model, yx_model, cf_fold=1, random_state=2022, categories='auto')
- Parameters:
x_model (estimator, optional) – The machine learning model which is trained to modeling the treatment. Any valid x_model should implement the
fit()
andpredict_proba()
methods.y_model (estimator, optional) – The machine learning model which is trained to modeling the outcome with covariates (possibly adjustment) and the treatment. Any valid y_model should implement the
fit()
andpredict()
methods.yx_model (estimator, optional) – The machine learning model which is trained in the final stage of doubly robust method to modeling the causal effects with covariates (possibly adjustment). Any valid yx_model should implement the
fit()
andpredict()
methods.cf_fold (int, default=1) – The number of folds for performing cross fit in the first stage.
random_state (int, default=2022) –
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None, combined_treatment=True, **kwargs)
Fit the DoublyRobust estimator model. Note that the trainig of a doubly robust model has three stages, where we implement them in
_fit_1st_stage()
and_fit_2nd_stage()
.- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
treat (int, optional) – Label of the intended treatment group. If None, then
treat
will be set as 1. In the case of single discrete treatment, treat should be an int or str in one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.control (int, optional) – Label of the intended control group. This is similar to the cases of treat. If None, then
control
will be set as 0.
- Returns:
The fitted instance of DoublyRobust.
- Return type:
instance of DoublyRobust
- estimate(data=None, quantity=None, treat=None, all_tr_effects=False)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
treat (float or numpy.ndarray, optional, default=None) – In the case of single discrete treatment, treat should be an int or str in one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment. For example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
all_tr_effects (bool, default=False,) – If True, return all causal effects with all values of
treatments
, otherwise only return the causal effect of the treatment with the value oftreat
if it is provided. Iftreat
is not provided, then the value of treatment is taken as the value of that when fitting the estimator model.
- Returns:
The estimated causal effects
- Return type:
ndarray
- effect_nji(data=None)
Calculate causal effects with different treatment values. Note that this method only will convert any problem with discrete treatment into that with binary treatment. One can use
_effect_nji_all()
to get casual effects with all values oftreat
taken bytreatment
.- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
Causal Tree
Causal Tree is a data-driven approach to partition the data into subpopulations which differ in the magnitude of their causal effects [Athey2015]. This method is applicable when the unconfoundness is satisfied given the adjustment set (covariate) \(V\). The interested causal effects is the CATE:
Due to the fact that the counterfactuals can never be observed, [Athey2015] developed an honest approach where the loss function (criterion for building a tree) is designed as
where \(N_{tr}\) is the number of samples in the training set \(S_{tr}\), \(p\) is the ratio of the number of samples in the treat group to that of the control group in the training set, and
Class Structures
- class ylearn.estimator_model.causal_tree.CausalTree(*, splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=2022, max_leaf_nodes=None, max_features=None, min_impurity_decrease=0.0, min_weight_fraction_leaf=0.0, ccp_alpha=0.0, categories='auto')
- Parameters:
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
max_leaf_nodes (int, default to None) – Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None)
Fit the model on data to estimate the causal effect.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
outcome (list of str, optional) – Names of the outcomes.
treatment (list of str, optional) – Names of the treatments.
covariate (list of str, optional, default=None) – Names of the covariate vectors.
adjustment (list of str, optional, default=None) – Names of the covariate vectors. Note that we may only need the covariate set, which usually is a subset of the adjustment set.
treat (int or list, optional, default=None) –
If there is only one discrete treatment, then treat indicates the treatment group. If there are multiple treatment groups, then treat should be a list of str with length equal to the number of treatments. For example, when there are multiple discrete treatments,
array([‘run’, ‘read’])
means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (int or list, optional, default=None) – See treat.
- Returns:
Fitted CausalTree
- Return type:
instance of CausalTree
- estimate(data=None, quantity=None)
Estimate the causal effect of the treatment on the outcome in data.
- Parameters:
data (pandas.DataFrame, optional, default=None) – If None, data will be set as the training data.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
- Returns:
The estimated causal effect with the type of the quantity.
- Return type:
ndarray or float, optional
- plot_causal_tree(feature_names=None, max_depth=None, class_names=None, label='all', filled=False, node_ids=False, proportion=False, rounded=False, precision=3, ax=None, fontsize=None)
Plot a policy tree. The sample counts that are shown are weighted with any sample_weights that might be present. The visualization is fit automatically to the size of the axis. Use the
figsize
ordpi
arguments ofplt.figure
to control the size of the rendering.- Returns:
List containing the artists for the annotation boxes making up the tree.
- Return type:
annotations : list of artists
- decision_path(*, data=None, wv=None)
Return the decision path.
- Parameters:
wv (numpy.ndarray, default=None) – The input samples as an ndarray. If None, then the DataFrame data will be used as the input samples.
data (pandas.DataFrame, default=None) – The input samples. The data must contains columns of the covariates used for training the model. If None, the training data will be passed as input samples.
- Returns:
Return a node indicator CSR matrix where non zero elements indicates that the samples goes through the nodes.
- Return type:
indicator : sparse matrix of shape (n_samples, n_nodes)
- apply(*, data=None, wv=None)
Return the index of the leaf that each sample is predicted as.
- Parameters:
wv (numpy.ndarray, default=None) – The input samples as an ndarray. If None, then the DataFrame data will be used as the input samples.
data (pandas.DataFrame, default=None) – The input samples. The data must contains columns of the covariates used for training the model. If None, the training data will be passed as input samples.
- Returns:
For each datapoint v_i in v, return the index of the leaf v_i ends up in. Leaves are numbered within
[0; self.tree_.node_count)
, possibly with gaps in the numbering.- Return type:
v_leaves : array-like of shape (n_samples, )
- property feature_importance
- Returns:
Normalized total reduction of criteria by feature (Gini importance).
- Return type:
ndarray of shape (n_features,)
Forest Estimator Models
Random forest is a widely used algorithm in machine learning. Many empirical properties of random forest including stability and the ability of flexible adaptation to complicated forms have made random forest and its variants as popular and reliable choices in a lot of tasks. It is then a natural and crucial idea to extend tree based models for causal effect estimation such as causal tree to forest based ones. These works are pioneered by [Athey2018]. Similar to the case of machine learning, typically for causal effect estimation, forest estimator models have better performance than tree models while sharing equivalent interpretability and other advantages. Thus it is always recommended to try these estimator models first.
In YLearn, we currently cover three types of forest estimator models for causal effect estimation under the unconfoundness asssumption:
Generalized Random Forest
To adpat random forest to causal effect estimation, [Athey2018] proposed a generalized version of it, named as Generalized Random Forest (GRF), by altering the criterion when building a single tree and designing a new kind of ensemble method to combine these trained trees. GRF can be used in, for example, quantile regression while in YLearn, we focus on its ability of performing highly flexible non-parametric causal effect estimation.
We now consider such estimation with GRF. Suppose that we observe samples \((X_i, Y_i, V_i) \in \mathbb{R}^{d_x} \times \mathbb{R} \times \mathbb{R}^{d_v}\) where \(Y\) is the outcome, \(X\) is the treatment and \(V\) is the covariate which ensures the unconfoundness condition. The forest weights \(\alpha_i(v)\) is defined by
\[\begin{split}\alpha_i^b(v) = \frac{\mathbb{I}\left( \left\{ V_i \in L^b(v) \right\} \right)}{|L^b(v)|},\\ \alpha_i(v) = \frac{1}{B} \sum_{b = 1}^B \alpha_i^b(v),\end{split}\]where the subscript \(b\) refers to the \(b\)-th tree with a total number of \(B\) such trees, \(L^b(v)\) is the leaf that the sample which covariate \(v\) belongs to, and \(|L^b(v)|\) denotes the total number of training samples which fall into the samel leaf as the sample \(v\) for the \(b\)-th tree. Then the estimated causal effect can be expressed by
\[\left( \sum_{i=1}^n \alpha_i(x)(X_i - \bar{X}_\alpha)(X_i - \bar{X}_\alpha)^T\right)^{-1} \sum_{i = 1}^n \alpha_i(v) (X_i - \bar{X}_\alpha)(Y_i - \bar{Y}_\alpha)\]where \(\bar{X}_\alpha = \sum \alpha_i X_i\) and \(\bar{Y}_\alpha = \sum \alpha_i Y_i\).
We now provide an example useage of applying the
GRForest
.Besides this
GRForest
, YLearn also implements a naive version of GRF with pure python in an easy to understand manner to help users get some insights on how GRF works in code level. It is worth to mention that, however, this naive version of GRF is super slow (~5mins for fitting 100 trees in a dataset with 2000 samples and 10 features). One can find this naive GRF in the folder ylearn/estimator_model/_naive_forest/.The formal version of GRF is summarized as follows.
Class Structures
- class ylearn.estimator_model.GRForest(n_estimators=100, *, sub_sample_num=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, max_leaf_nodes=None, min_impurity_decrease=0.0, n_jobs=None, random_state=None, ccp_alpha=0.0, is_discrete_treatment=True, is_discrete_outcome=False, verbose=0, warm_start=False, honest_subsample_num=None)
- Parameters:
n_estimators (int, default=100) – The number of trees for growing the GRF.
sub_sample_num (int or float, default=None) –
The number of samples to train each individual tree.
If a float is given, then the number of
sub_sample_num*n_samples
samples will be sampled to train a single treeIf an int is given, then the number of
sub_sample_num
samples will be sampled to train a single treemax_depth (int, default=None) – The max depth that a single tree can reach. If
None
is given, then there is no limit of the depth of a single tree.min_samples_split (int, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
max_leaf_nodes (int, default=None) – Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease (float, default=0.0) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
n_jobs (int, default=None) – The number of jobs to run in parallel.
fit()
,estimate()
, andapply()
are all parallelized over the trees.None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details.verbose (int, default=0) – Controls the verbosity when fitting and predicting
honest_subsample_num (int or float, default=None) –
The number of samples to train each individual tree in an honest manner. Typically setting this value will have better performance.
Use all
sub_sample_num
ifNone
is given.If a float is given, then the number of
honest_subsample_num*sub_sample_num
samples will be used to train a single tree while the rest(1 - honest_subsample_num)*sub_sample_num
samples will be used to label the trained tree.If an int is given, then the number of
honest_subsample_num
samples will be sampled to train a single tree while the restsub_sample_num - honest_subsample_num
samples will be used to label the trained tree.
- fit(data, outcome, treatment, adjustment=None, covariate=None)
Fit the model on data to estimate the causal effect.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
outcome (list of str, optional) – Names of the outcomes.
treatment (list of str, optional) – Names of the treatments.
covariate (list of str, optional, default=None) – Names of the covariate vectors.
adjustment (list of str, optional, default=None) – This will be the same as the covariate.
sample_weight (ndarray, optional, default=None) – Weight of each sample of the training set.
- Returns:
Fitted GRForest
- Return type:
instance of GRForest
- estimate(data=None)
Estimate the causal effect of the treatment on the outcome in data.
- Parameters:
data (pandas.DataFrame, optional, default=None) – If None, data will be set as the training data.
- Returns:
The estimated causal effect.
- Return type:
ndarray or float, optional
- apply(*, v)
Apply trees in the forest to X, return leaf indices.
- Parameters:
v (numpy.ndarray,) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
.- Returns:
For each datapoint v_i in v and for each tree in the forest, return the index of the leaf v ends up in.
- Return type:
v_leaves : array-like of shape (n_samples, )
- property feature_importance
- Returns:
Normalized total reduction of criteria by feature (Gini importance).
- Return type:
ndarray of shape (n_features,)
Causal Forest
In [Athey2018], the authors argued that by imposing the local centering technique, i.e., by first regressing out the effect and treatment respectively aka the so called double machine learning framework, the performance of Generalized Random Forest (GRF) can be further improved. In YLearn, we implement the class CausalForest to support such technique. We illustrate its useage in the following example.
Class Structures
- class ylearn.estimator_model.CausalForest(x_model, y_model, n_estimators=100, *, cf_fold=1, sub_sample_num=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, max_leaf_nodes=None, min_impurity_decrease=0.0, n_jobs=None, random_state=None, ccp_alpha=0.0, is_discrete_treatment=True, is_discrete_outcome=False, verbose=0, warm_start=False, honest_subsample_num=None, adjustment_transformer=None, covariate_transformer=None, proba_output=False)
- Parameters:
x_model (estimator, optional) – Machine learning models for fitting x. Any such models should implement the
fit()
andpredict`()
(alsopredict_proba()
if x is discrete) methods.cf_fold (int, default=1) – The number of folds for performing cross fit in the first stage.
y_model (estimator, optional) – The machine learning model which is trained to modeling the outcome. Any valid y_model should implement the
fit()
andpredict()
methods.n_estimators (int, default=100) – The number of trees for growing the GRF.
sub_sample_num (int or float, default=None) –
The number of samples to train each individual tree.
If a float is given, then the number of
sub_sample_num*n_samples
samples will be sampled to train a single treeIf an int is given, then the number of
sub_sample_num
samples will be sampled to train a single treemax_depth (int, default=None) – The max depth that a single tree can reach. If
None
is given, then there is no limit of the depth of a single tree.min_samples_split (int, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
max_leaf_nodes (int, default=None) – Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease (float, default=0.0) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
n_jobs (int, default=None) – The number of jobs to run in parallel.
fit()
,estimate()
, andapply()
are all parallelized over the trees.None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.verbose (int, default=0) – Controls the verbosity when fitting and predicting
honest_subsample_num (int or float, default=None) –
The number of samples to train each individual tree in an honest manner. Typically setting this value will have better performance.
Use all
sub_sample_num
ifNone
is given.If a float is given, then the number of
honest_subsample_num*sub_sample_num
samples will be used to train a single tree while the rest(1 - honest_subsample_num)*sub_sample_num
samples will be used to label the trained tree.If an int is given, then the number of
honest_subsample_num
samples will be sampled to train a single tree while the restsub_sample_num - honest_subsample_num
samples will be used to label the trained tree.adjustment_transformer (transformer, default=None) – Transfomer of adjustment variables. This can be used to generate new features.
covariate_transformer (transformer, default=None) – Transfomer of covariate variables. This can be used to generate new features.
proba_output (bool, default=False) – Whether to estimate probability of the outcome if it is a discrete one. If True, then the given
y_model
must have the methodpredict_proba()
.
- fit(data, outcome, treatment, adjustment=None, covariate=None, control=None)
Fit the model on data to estimate the causal effect. Note that when a discrete treatment is given, then the first column will be automatically assumed as the control while other columns different treat assignments if
control
is not specified explicitly.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
outcome (list of str, optional) – Names of the outcomes.
treatment (list of str, optional) – Names of the treatments.
covariate (list of str, optional, default=None) – Names of the covariate vectors.
adjustment (list of str, optional, default=None) – This will be the same as the covariate.
sample_weight (ndarray, optional, default=None) – Weight of each sample of the training set.
control (str, optional, default=None) – The value of the parameter
treatment
whcih will be the control group to estimate the causal effect. IfNone
is given, then the first column of thetreatment
will be thecontrol
.- Returns:
Fitted GRForest
- Return type:
instance of GRForest
- estimate(data=None)
Estimate the causal effect of the treatment on the outcome in data.
- Parameters:
data (pandas.DataFrame, optional, default=None) – If None, data will be set as the training data.
- Returns:
The estimated causal effect.
- Return type:
ndarray or float, optional
- apply(*, v)
Apply trees in the forest to v, return leaf indices.
- Parameters:
v (numpy.ndarray,) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
.- Returns:
For each datapoint v_i in v and for each tree in the forest, return the index of the leaf v ends up in.
- Return type:
v_leaves : array-like of shape (n_samples, )
- property feature_importance
- Returns:
Normalized total reduction of criteria by feature (Gini importance).
- Return type:
ndarray of shape (n_features,)
Ensemble of Causal Trees
An efficient and useful technique for growing a random forest is by simply averaging the result of each individual tree. Consequently, we can also apply this technique to grow a causal forest by combining many single causal tree. In YLearn, we implement this idea in the class
CTCausalForest
(refering to Causal Tree Causal Forest).Since it is an ensemble of a bunch of CausalTree, currently it only supports binary treatment. One may need specify the treat and control groups before applying the CTCausalForest. This will be improved in the future version.
We provide below an example of it.
Class Structures
- class ylearn.estimator_model.CTCausalForest(n_estimators=100, *, sub_sample_num=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=1.0, min_impurity_decrease=0.0, n_jobs=None, random_state=None, ccp_alpha=0.0, is_discrete_treatment=True, is_discrete_outcome=False, verbose=0, warm_start=False, honest_subsample_num=None)
- Parameters:
n_estimators (int, default=100) – The number of trees for growing the CTCausalForest.
sub_sample_num (int or float, default=None) –
The number of samples to train each individual tree.
If a float is given, then the number of
sub_sample_num*n_samples
samples will be sampled to train a single treeIf an int is given, then the number of
sub_sample_num
samples will be sampled to train a single treemax_depth (int, default=None) – The max depth that a single tree can reach. If
None
is given, then there is no limit of the depth of a single tree.min_samples_split (int, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
min_impurity_decrease (float, default=0.0) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
n_jobs (int, default=None) – The number of jobs to run in parallel.
fit()
,estimate()
, andapply()
are all parallelized over the trees.None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.verbose (int, default=0) – Controls the verbosity when fitting and predicting
honest_subsample_num (int or float, default=None) –
The number of samples to train each individual tree in an honest manner. Typically setting this value will have better performance.
Use all
sub_sample_num
ifNone
is given.If a float is given, then the number of
honest_subsample_num*sub_sample_num
samples will be used to train a single tree while the rest(1 - honest_subsample_num)*sub_sample_num
samples will be used to label the trained tree.If an int is given, then the number of
honest_subsample_num
samples will be sampled to train a single tree while the restsub_sample_num - honest_subsample_num
samples will be used to label the trained tree.
- fit(data, outcome, treatment, adjustment=None, covariate=None, treat=None, control=None)
Fit the model on data to estimate the causal effect. Note that similar to CausalTree, currently CTCausalForest assumes a binary treatment where the values of
treat
andcontrol
are controled by the corresponding parameters.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
outcome (list of str, optional) – Names of the outcomes.
treatment (list of str, optional) – Names of the treatments.
covariate (list of str, optional, default=None) – Names of the covariate vectors.
adjustment (list of str, optional, default=None) – This will be the same as the covariate.
sample_weight (ndarray, optional, default=None) – Weight of each sample of the training set.
treat (int or list, optional, default=None) –
If there is only one discrete treatment, then treat indicates the treatment group. If there are multiple treatment groups, then treat should be a list of str with length equal to the number of treatments. For example, when there are multiple discrete treatments,
array([‘run’, ‘read’])
means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’.
control (int or list, optional, default=None) – See treat.
- Returns:
Fitted CTCausalForest
- Return type:
instance of CTCausalForest
- estimate(data=None)
Estimate the causal effect of the treatment on the outcome in data.
- Parameters:
data (pandas.DataFrame, optional, default=None) – If None, data will be set as the training data.
- Returns:
The estimated causal effect.
- Return type:
ndarray or float, optional
- apply(*, v)
Apply trees in the forest to X, return leaf indices.
- Parameters:
v (numpy.ndarray,) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
.- Returns:
For each datapoint v_i in v and for each tree in the forest, return the index of the leaf v ends up in.
- Return type:
v_leaves : array-like of shape (n_samples, )
- property feature_importance
- Returns:
Normalized total reduction of criteria by feature (Gini importance).
- Return type:
ndarray of shape (n_features,)
Instrumental Variables
Instrumental Variables (IV) deal with the case for estimating causal effects in the presence of unobserved confounding variables that simultaneously have effects on the treatment \(X\) and the outcome \(Y\). A set of variables \(Z\) is said to be a set of instrumental variables if for any \(z\) in \(Z\):
\(z\) has a causal effect on \(X\).
The causal effect of \(z\) on \(Y\) is fully mediated by \(X\).
There are no back-door paths from \(z\) to \(Y\).
In such case, we must first find the IV (which can be done by using the CausalModel
, see Identification). For an instance, the variable
\(Z\) in the following figure can serve as a valid IV for estimating the causal effects of \(X\) on \(Y\) in the presence of the unobserved confounder
\(U\).

Causal graph with IV
YLearn implements two different methods related to IV: deepiv [Hartford], which utilizes the deep learning models to IV, and IV of nonparametric models [Newey2002].
The IV Framework and Problem Setting
The IV framework aims to predict the value of the outcome \(y\) when the treatment \(x\) is given. Besides, there also exist some covariates vectors \(v\) that simultaneously affect both \(y\) and \(x\). There also are some unobserved confounders \(e\) that potentially also affect \(y\), \(x\) and \(v\). The core part of causal questions lies in estimating the causal quantity
in the following causal graph, where the set of causal relationships are determined by the set of functions

Causal graph with IV and both observed and unobserved confounders
The IV framework solves this problem by doing a two-stage estimation:
Estimate \(\hat{H}(z, v)\) that captures the relationship between \(x\) and the variables \((z, v)\).
Replace \(x\) with the predicted result of \(\hat{H}(z, v)\) given \((v, z)\). Then estimate \(\hat{G}(x, v)\) to build the relationship between \(y\) and \((x, v)\).
The final casual effects can then be calculated.
IV Classes
Nonparametric Instrumental Variables
Two-stage Least Squares
When the relationship between the outcome \(y\), treatment \(x\) and covariate \(v\) are assumed to be linear, e.g., [Angrist1996],
then the IV framework becomes direct: it will first train a linear model for \(x\) given \(z\) and \(v\), then it replaces \(x\) with the predicted values \(\hat{x}\) to train a linear model for \(y\) in the second stage. This procedure is called the two-stage least-squares (2SLS).
Nonparametric IV
Removing the linear assumptions regarding the relationships between variables, the nonparametric IV can replace the linear regression with a linear projection onto a series of known basis functions [Newey2002].
This method is similar to the conventional 2SLS and is also composed of 2 stages after finding new features of \(x\), \(v\), and \(z\),
which are represented by some non-linear functions (basis functions) \(f_d\) and \(g_{\mu}\). After transforming into the new spaces, we then
Fit the treatment model:
\[\hat{x}(z, v, w) = \sum_{d, \mu} A_{d, \mu} \tilde{z}_d \tilde{v}_{\mu} + h(v, w) + \eta\]
Generate new treatments x_hat, and then fit the outcome model
\[y(\hat{x}, v, w) = \sum_{m, \mu} B_{m, \mu} \psi_m(\hat{x}) \tilde{v}_{\mu} + k(v, w) + \epsilon.\]
The final causal effect can then be estimated. For an example, the CATE given \(v\) is estimated as
\[y(\hat{x_t}, v, w) - y(\hat{x_0}, v, w) = \sum_{m, \mu} B_{m, \mu} (\psi_m(\hat{x_t}) - \psi_m(\hat{x_0})) \tilde{v}_{\mu}.\]
YLearn implement this procedure in the class NP2SLS
.
Class structures
- class ylearn.estimator_model.iv.NP2SLS(x_model=None, y_model=None, random_state=2022, is_discrete_treatment=False, is_discrete_outcome=False, categories='auto')
- Parameters:
x_model (estimator, optional, default=None) – The machine learning model to model the treatment. Any valid x_model should implement the fit and predict methods, by default None
y_model (estimator, optional, default=None) – The machine learning model to model the outcome. Any valid y_model should implement the fit and predict methods, by default None
random_state (int, default=2022) –
is_discrete_treatment (bool, default=False) –
is_discrete_outcome (bool, default=False) –
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, instrument, is_discrete_instrument=False, treatment_basis=('Poly', 2), instrument_basis=('Poly', 2), covar_basis=('Poly', 2), adjustment=None, covariate=None, **kwargs)
Fit a NP2SLS. Note that when both treatment_basis and instrument_basis have degree 1 we are actually doing 2SLS.
- Parameters:
data (DataFrame) – Training data for the model.
outcome (str or list of str, optional) – Names of the outcomes.
treatment (str or list of str, optional) – Names of the treatment.
covariate (str or list of str, optional, default=None) – Names of the covariate vectors.
instrument (str or list of str, optional) – Names of the instrument variables.
adjustment (str or list of str, optional, default=None) – Names of the adjustment variables.
treatment_basis (tuple of 2 elements, optional, default=('Poly', 2)) – Option for transforming the original treatment vectors. The first element indicates the transformation basis function while the second one denotes the degree. Currently only support ‘Poly’ in the first element.
instrument_basis (tuple of 2 elements, optional, default=('Poly', 2)) – Option for transforming the original instrument vectors. The first element indicates the transformation basis function while the second one denotes the degree. Currently only support ‘Poly’ in the first element.
covar_basis (tuple of 2 elements, optional, default=('Poly', 2)) – Option for transforming the original covariate vectors. The first element indicates the transformation basis function while the second one denotes the degree. Currently only support ‘Poly’ in the first element.
is_discrete_instrument (bool, default=False) –
- estimate(data=None, treat=None, control=None, quantity=None)
Estimate the causal effect of the treatment on the outcome in data.
- Parameters:
data (pandas.DataFrame, optional, default=None) – If None, data will be set as the training data.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
treat (float, optional, default=None) – Value of the treament when imposing intervention. If None, then treat will be set to 1.
control (float, optional, default=None) – Value of the treament such that the treament effect is \(y(do(x=treat)) - y (do(x = control))\).
- Returns:
The estimated causal effect with the type of the quantity.
- Return type:
ndarray or float, optional
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator will use the training data if data is None.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
DeepIV
DeepIV, developed in [Hartford], is a method for estimating the causal effects in the presence of the unobserved confounder between treatment and outcome variables. It applies deep learning methods to accurately characterize the causal relationships between the treatment and outcome when the instrumental variables (IV) are present. Due to the representation powers of deep learning models, it does not assume any parametric forms for the causal relationships.
Training a DeepIV has two steps and resembles the estimation procedure of a normal IV method. Specifically, we
train a neural network, which we refer to as the treatment network \(F(Z, V)\), to estimate the distribution of the treatment \(X\) given the IV \(Z\) and covariate variables \(V\)
train another neural network, which we refer to as the outcome network \(H(X, V)\), to estimate the outcome \(Y\) given treatment \(X\) and covariate variables \(V\).
The final causal effect can then be estimated by the outcome network \(H(X, W)\). For an instance, the CATE \(\tau(v)\) is estimated as
Class Structures
- class ylearn.estimator_model.deepiv.DeepIV(x_net=None, y_net=None, x_hidden_d=None, y_hidden_d=None, num_gaussian=5, is_discrete_treatment=False, is_discrete_outcome=False, is_discrete_instrument=False, categories='auto', random_state=2022)
- Parameters:
x_net (ylearn.estimator_model.deepiv.Net, optional, default=None) – Representation of the mixture density network for continuous treatment or an usual classification net for discrete treatment. If None, the default neural network will be used. See
ylearn.estimator_model.deepiv.Net
for reference.y_net (ylearn.estimator_model.deepiv.Net, optional, default=None) – Representation of the outcome network. If None, the default neural network will be used.
x_hidden_d (int, optional, default=None) – Dimension of the hidden layer of the default x_net of DeepIV.
y_hidden_d (int, optional, default=None) – Dimension of the hidden layer of the default y_net of DeepIV.
is_discrete_treatment (bool, default=False) –
is_discrete_instrument (bool, default=False) –
is_discrete_outcome (bool, default=False) –
num_gaussian (int, default=5) – Number of gaussians when using the mixture density network which will be directly ignored when the treatment is discrete.
random_state (int, default=2022) –
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, instrument=None, adjustment=None, approx_grad=True, sample_n=None, y_net_config=None, x_net_config=None, **kwargs)
Train the DeepIV model.
- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
instrument (list of str, optional) – Names of the IV. Must provide for DeepIV.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness, which can also be seen as the covariates in the current version.
approx_grad (bool, default=True) – Whether use the approximated gradient as in [Hartford].
sample_n (int, optional, default=None) – Times of new samples when using the approx_grad technique.
x_net_config (dict, optional, default=None) – Configuration of the x_net.
y_net_config (dict, optional, default=None) – Configuration of the y_net.
- Returns:
The trained DeepIV model
- Return type:
instance of DeepIV
- estimate(data=None, treat=None, control=None, quantity=None, marginal_effect=False, *args, **kwargs)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – Test data. The model will use the training data if set as None.
quantity (str, optional, default=None) –
Option for returned estimation result. The possible values of quantity include:
’CATE’ : the estimator will evaluate the CATE;
’ATE’ : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE.
treat (int, optional, default=None) – Value of the treatment, by default None. If None, then the model will set treat=1.
control (int, optional, default=None) – Value of the control, by default None. If None, then the model will set control=1.
- Returns:
Estimated causal effects
- Return type:
torch.tensor
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
Scoring Estimated Causal Effects
Estimator models for estimating causal effects can not be easily evaluated due to the fact that the true effects are not directly observed. This differs from the usual machine learning tasks whose results can be easily evaluated by, for example, the value of loss functions.
Authors in [Schuler] proposed a framework, a schema suggested by [Nie], to evaluate causal
effects estimated by different estimator models. Roughly speaking, this
framework is a direct application of the double machine learning methods.
Specifically, for a causal effect model ce_model()
(trained in a training set)
that is waited to be evaluated, we
Train a model
y_model()
to estimate the outcome \(y\) and ax_model()
to estimate the treatment \(x\) in a validation set, which is usually not the same as the training set;In the validation set \(D_{val}\), let \(\tilde{y}\) and \(\tilde{x}\) denote the differences
\[\begin{split}\tilde{y} & = y - \hat{y}(v), \\ \tilde{x} & = x - \hat{x}(v)\end{split}\]where \(\hat{y}\) and \(\hat{x}\) are the estimated outcome and treatment on covariates \(v\) in \(D_{val}\). Furthermore, let
\[\tau(v)\]denote the causal effects estimated by the
ce_model()
in \(D_{val}\), then the metric of the causal effect for the ce_model is calculated as\[E_{V}[(\tilde{y} - \tilde{x} \tau(v))^2].\]
Class Structures
- class ylearn.estimator_model.effect_score.RLoss(x_model, y_model, yx_model=None, cf_fold=1, adjustment_transformer=None, covariate_transformer=None, random_state=2022, is_discrete_treatment=False, categories='auto')
- Parameters:
x_model (estimator, optional) – Machine learning models for fitting x. Any such models should implement the
fit()
andpredict`()
(alsopredict_proba()
if x is discrete) methods.y_model (estimator, optional) – The machine learning model which is trained to modeling the outcome. Any valid y_model should implement the
fit()
andpredict()
methods.yx_model (estimator, optional) – Machine learning models for fitting the residual of y on residual of x. Only support linear regression model in the current version.
cf_fold (int, default=1) – The number of folds for performing cross fit in the first stage.
adjustment_transformer (transormer, optional, default=None,) – Transformer for adjustment variables which can be used to generate new features of adjustment variables.
covariate_transformer (transormer, optional, default=None,) – Transformer for covariate variables which can be used to generate new features of covariate variables.
random_state (int, default=2022) –
is_discrete_treatment (bool, default=False) – If the treatment variables are discrete, set this to True.
categories (str, optional, default='auto') –
- fit(data, outcome, treatment, adjustment=None, covariate=None, combined_treatment=True, **kwargs)
Fit the RLoss estimator model. Note that the training of a DML has two stages, where we implement them in
_fit_1st_stage()
and_fit_2nd_stage()
.- Parameters:
data (pandas.DataFrame) – Training dataset for training the estimator.
outcome (list of str, optional) – Names of the outcome.
treatment (list of str, optional) – Names of the treatment.
adjustment (list of str, optional, default=None) – Names of the adjustment set ensuring the unconfoundness,
covariate (list of str, optional, default=None) – Names of the covariate.
combined_treatment (bool, default=True) –
When combined_treatment is set to True, then if there are multiple treatments, we can use the combined_treatment technique to covert the multiple discrete classification tasks into a single discrete classification task. For an example, if there are two different binary treatments:
\[\begin{split}treatment_1 &: x_1 | x_1 \in \{'sleep', 'run'\}, \\ treatment_2 &: x_2 | x_2 \in \{'study', 'work'\},\end{split}\]then we can convert to these two binary classification tasks into a single classification with 4 different classes:
\[treatment: x | x \in \{0, 1, 2, 3\},\]where, for example, 1 stands for (‘sleep’ and ‘study’).
- Returns:
instance of RLoss
- Return type:
The fitted RLoss model for evaluating other estimator models in the validation set.
- score(test_estimator, treat=None, control=None)
Estimate the causal effect with the type of the quantity.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator directly evaluate all quantities in the training data if data is None.
treat (float or numpy.ndarray, optional, default=None) – In the case of single discrete treatment, treat should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list or an ndarray where treat[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, array([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, treat should be a float or a ndarray.
control (float or numpy.ndarray, optional, default=None) – This is similar to the cases of treat.
- Returns:
The score for the test_estimator
- Return type:
float
- effect_nji(data=None)
Calculate causal effects with different treatment values.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data for the estimator to evaluate the causal effect, note that the estimator will use the training data if data is None.
- Returns:
Causal effects with different treatment values.
- Return type:
ndarray
- comp_transormer(x, categories='auto')
Transform the discrete treatment into one-hot vectors properly.
- Parameters:
x (numpy.ndarray, shape (n, x_d)) – An array containing the information of the treatment variables.
categories (str or list, optional, default='auto') –
- Returns:
The transformed one-hot vectors.
- Return type:
numpy.ndarray
The evaluations of
in ATE and
in CATE will be the tasks of various suitable estimator models in YLearn. The concept EstimatorModel
in YLearn is designed for this purpose.
A typical EstimatorModel
should have the following structure:
class BaseEstModel:
"""
Base class for various estimator model.
Parameters
----------
random_state : int, default=2022
is_discrete_treatment : bool, default=False
Set this to True if the treatment is discrete.
is_discrete_outcome : bool, default=False
Set this to True if the outcome is discrete.
categories : str, optional, default='auto'
"""
def fit(
self,
data,
outcome,
treatment,
**kwargs,
):
"""Fit the estimator model.
Parameters
----------
data : pandas.DataFrame
The dataset used for training the model
outcome : str or list of str, optional
Names of the outcome variables
treatment : str or list of str
Names of the treatment variables
Returns
-------
instance of BaseEstModel
The fitted estimator model.
"""
def estimate(
self,
data=None,
quantity=None,
**kwargs
):
"""Estimate the causal effect.
Parameters
----------
data : pd.DataFrame, optional
The test data for the estimator to evaluate the causal effect, note
that the estimator directly evaluate all quantities in the training
data if data is None, by default None
quantity : str, optional
The possible values of quantity include:
'CATE' : the estimator will evaluate the CATE;
'ATE' : the estimator will evaluate the ATE;
None : the estimator will evaluate the ITE or CITE, by default None
Returns
-------
ndarray
The estimated causal effect with the type of the quantity.
"""
def effect_nji(self, data=None, *args, **kwargs):
"""Return causal effects for all possible values of treatments.
Parameters
----------
data : pd.DataFrame, optional
The test data for the estimator to evaluate the causal effect, note
that the estimator directly evaluate all quantities in the training
data if data is None, by default None
"""
Causal Discovery: Exploring the Causal Structures in Data
No-Tears
The problem of revealing the structures of directed acyclic graphs (DAGs) can be solved by formulating a continuous optimization problem over real matrices with the constraint enforcing the acyclicity condition [Zheng2018]. Specifically, for a given vector \(x \in \mathbb{R}^d\) such that there exists a matrix \(V\) which satisifies \(x = Vx + \eta\) for some noise vector \(\eta \in \mathbb{R}^d\), the optimization problem can be summarized as follows:
where \(F(W)\) is a continuous function measuring \(\|x - Wx\|\) and
where \(\circ\) is the Hadamard product. This optimization can then be solved with some optimization technique, such as gradient desscent.
The YLearn class for the NO-TEARS algorithm is CausalDiscovery
.
A fundamental task in causal learning is to find the underlying causal relationships, the so-called “causal structures”, and apply them. Traditionally, these relationships might be revealed by designing randomized experiments or imposing interventions. However, such methods are too expensive or even impossible. Therefore, many techniques, e.g., the PC algorithm (see [Spirtes2001]), have been suggested recently to analyze the causal structures by directly utilizing observational data. These techniques are named as causal discovery.
The current version of YLearn implements a score-based method for causal discovery [Zheng2018]. More methods will be added in later versions.
Policy: Selecting the Best Option
In tasks such as policy evaluation, e.g., [Athey2020], besides the causal effects, we may also have interest in other questions such as whether an example should be assigned to a treatment and if the answer is yes, which option is
the best among all possible treatment values. YLearn implements PolicyTree
for such purpose. Given a trained estimator model or estimated causal effects, it finds the optimal polices for each
example by building a decision tree model which aims to maximize the causal effect of each example.
The criterion for training the tree is
where \(g_{ik} = \phi(v_i)_k\) with \(\phi: \mathbb{R}^D \to \mathbb{R}^K\) being a map from \(v_i\in \mathbb{R}^D\) to a basis vector with only one nonzero element in \(\mathbb{R}^K\) and \(e_{ki}\) denotes the causal effect of taking the \(k\)-th value of the treatment for example \(i\).
See also
BaseDecisionTree
in sklearn.
Note that one can use the PolicyInterpreter
to interpret the result of a policy model.
Class Structures
- class ylearn.policy.policy_model.PolicyTree(*, criterion='policy_reg', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=2022, max_leaf_nodes=None, max_features=None, min_impurity_decrease=0.0, ccp_alpha=0.0, min_weight_fraction_leaf=0.0)
- Parameters:
criterion ({'policy_reg'}, default="'policy_reg'") –
The function to measure the quality of a split. The criterion for training the tree is (in the Einstein notation)
\[S = \sum_i g_{ik} e^k_{i},\]where \(g_{ik} = \phi(v_i)_k\) is a map from the covariates, \(v_i\), to a basis vector which has only one nonzero element in the \(R^k\) space. By using this criterion, the aim of the model is to find the index of the treatment which will render the max causal effect, i.e., finding the optimal policy.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int or float, default=2) – The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
max_leaf_nodes (int, default to None) – Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.
- fit(data, covariate, *, effect=None, effect_array=None, est_modle=None, sample_weight=None)
Fit the PolicyInterpreter model to interpret the policy for the causal effect estimated by the est_model on data. One has several options for passing the causal effects, which usually is a vector of (n, j, i) where n is the number of the examples, j is the dimension of the outcome, and i is the number of possible treatment values or the dimension of the treatment:
Only pass est_model. Then est_model will be used to generate the causal effects.
Only pass effect_array which will be set as the causal effects and effect and est_model will be ignored.
Only pass effect. This usually is a list of names of the causal effect in data which will then be used as the causal effects for training the model.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
est_model (estimator_model) – est_model should be any valid estimator model of ylearn which was already fitted and can estimate the CATE. If effect=None and effect_array=None, then est_model can not be None and the causal effect will be estimated by the est_model.
covariate (list of str, optional, default=None) – Names of the covariate.
effect (list of str, optional, default=None) – Names of the causal effect in data. If effect_array is not None, then effect will be ignored.
effect_array (numpy.ndarray, default=None) – The causal effect that waited to be fitted by
PolicyTree
. If this is not provided and est_model is None, then effect can not be None.
- Returns:
Fitted PolicyModel
- Return type:
instance of PolicyModel
- predict_ind(data=None)
Estimate the optimal policy for the causal effects of the treatment on the outcome in the data, i.e., return the index of the optimal treatment.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data in the form of the DataFrame. The model will only use this if v is set as None. In this case, if data is also None, then the data used for trainig will be used.
- Returns:
The index of the optimal treatment dimension.
- Return type:
ndarray or int, optional
- predict_opt_effect(data=None)
Estimate the value of the optimal policy for the causal effects of the treatment on the outcome in the data, i.e., return the value of the causal effects when taking the optimal treatment.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data in the form of the DataFrame. The model will only use this if v is set as None. In this case, if data is also None, then the data used for trainig will be used.
- Returns:
The estimated causal effect with the optimal treatment value.
- Return type:
ndarray or float, optional
- apply(*, v=None, data=None)
Return the index of the leaf that each sample is predicted as.
- Parameters:
v (numpy.ndarray, default=None) – The input samples as an ndarray. If None, then the DataFrame data will be used as the input samples.
data (pandas.DataFrame, default=None) – The input samples. The data must contains columns of the covariates used for training the model. If None, the training data will be passed as input samples.
- Returns:
For each datapoint v_i in v, return the index of the leaf v_i ends up in. Leaves are numbered within
[0; self.tree_.node_count)
, possibly with gaps in the numbering.- Return type:
v_leaves : array-like of shape (n_samples, )
- decision_path(*, v=None, data=None)
Return the decision path.
- Parameters:
v (numpy.ndarray, default=None) – The input samples as an ndarray. If None, then the DataFrame data will be used as the input samples.
data (pandas.DataFrame, default=None) – The input samples. The data must contains columns of the covariates used for training the model. If None, the training data will be passed as input samples.
- Returns:
Return a node indicator CSR matrix where non zero elements indicates that the samples goes through the nodes.
- Return type:
indicator : sparse matrix of shape (n_samples, n_nodes)
- get_depth()
Return the depth of the policy tree. The depth of a tree is the maximum distance between the root and any leaf.
- Returns:
The maximum depth of the tree.
- Return type:
int
- get_n_leaves()
Return the number of leaves of the policy tree.
- Returns:
Number of leaves
- Return type:
int
- property feature_importance
Return the feature importances. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance. Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importance()
as an alternative.- Returns:
Normalized total reduction of criteria by feature (Gini importance).
- Return type:
ndarray of shape (n_features,)
- property n_features_
- Returns:
number of features
- Return type:
int
- plot(*, feature_names=None, max_depth=None, class_names=None, label='all', filled=False, node_ids=False, proportion=False, rounded=False, precision=3, ax=None, fontsize=None)
Plot the PolicyTree. The sample counts that are shown are weighted with any sample_weights that might be present. The visualization is fit automatically to the size of the axis. Use the
figsize
ordpi
arguments ofplt.figure
to control the size of the rendering.- Returns:
List containing the artists for the annotation boxes making up the tree.
- Return type:
annotations : list of artists
Interpreter: Explaining the Causal Effects
To interpret the causal effects estimated by various estimator models, YLearn implements tree models CEInterpreter
for causal effect
interpretabilities and PolicyInterpreter
for policy evaluation interpretabilities in the current version.
CEInterpreter
For the CATE \(\tau(v)\) estimated by an estimator model, e.g., double machine learning model, CEInterpreter
interprets the results
by building a decision tree to model the relationships between \(\tau(v)\) and the covariates \(v\). Then one can use the decision rules
of the fitted tree model to analyze \(\tau(v)\).
Class Structures
- class ylearn.effect_interpreter.ce_interpreter.CEInterpreter(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=2022, max_leaf_nodes=None, max_features=None, min_impurity_decrease=0.0, min_weight_fraction_leaf=0.0, ccp_alpha=0.0, categories='auto')
- Parameters:
criterion ({"squared_error", "friedman_mse", "absolute_error", "poisson"}, default="squared_error") – The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node, “friedman_mse”, which uses mean squared error with Friedman’s improvement score for potential splits, “absolute_error” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node, and “poisson” which uses reduction in Poisson deviance to find splits.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int or float, default=2) – The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
max_leaf_nodes (int, default to None) – Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.
- fit(data, est_model, **kwargs)
Fit the CEInterpreter model to interpret the causal effect estimated by the est_model on data.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
est_model (estimator_model) – est_model should be any valid estimator model of ylearn which was already fitted and can estimate the CATE.
- Returns:
Fitted CEInterpreter
- Return type:
instance of CEInterpreter
- interpret(*, v=None, data=None)
Interpret the fitted model in the test data.
- Parameters:
v (numpy.ndarray, optional, default=None) – The test covariates in the form of ndarray. If this is given, then data will be ignored and the model will use this as the test data.
data (pandas.DataFrame, optional, default=None) – The test data in the form of the DataFrame. The model will only use this if v is set as None. In this case, if data is also None, then the data used for training will be used.
- Returns:
The interpreted results for all examples.
- Return type:
dict
- plot(*, feature_names=None, max_depth=None, class_names=None, label='all', filled=False, node_ids=False, proportion=False, rounded=False, precision=3, ax=None, fontsize=None)
Plot the fitted tree model. The sample counts that are shown are weighted with any sample_weights that might be present. The visualization is fit automatically to the size of the axis. Use the
figsize
ordpi
arguments ofplt.figure
to control the size of the rendering.- Returns:
List containing the artists for the annotation boxes making up the tree.
- Return type:
annotations : list of artists
PolicyInterpreter
PolicyInterpreter
can be used to interpret the policy returned by an instance of PolicyTree
. By assigning
different strategies to different examples, it aims to maximize the casual effects of a subgroup and separate them from those
with negative causal effects.
Class Structures
- class ylearn.interpreter.policy_interpreter.PolicyInterpreter(*, criterion='policy_reg', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, random_state=2022, max_leaf_nodes=None, max_features=None, min_impurity_decrease=0.0, ccp_alpha=0.0, min_weight_fraction_leaf=0.0)
- Parameters:
criterion ({'policy_reg'}, default="'policy_reg'") –
The function to measure the quality of a split. The criterion for training the tree is (in the Einstein notation)
\[S = \sum_i g_{ik} y^k_{i},\]where \(g_{ik} = \phi(v_i)_k\) is a map from the covariates, \(v_i\), to a basis vector which has only one nonzero element in the \(R^k\) space. By using this criterion, the aim of the model is to find the index of the treatment which will render the max causal effect, i.e., finding the optimal policy.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int or float, default=2) – The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float or {"sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int) – Controls the randomness of the estimator.
max_leaf_nodes (int, default to None) – Grow a tree with
max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.
- fit(data, est_model, *, covariate=None, effect=None, effect_array=None)
Fit the PolicyInterpreter model to interpret the policy for the causal effect estimated by the est_model on data.
- Parameters:
data (pandas.DataFrame) – The input samples for the est_model to estimate the causal effects and for the CEInterpreter to fit.
est_model (estimator_model) – est_model should be any valid estimator model of ylearn which was already fitted and can estimate the CATE.
covariate (list of str, optional, default=None) – Names of the covariate.
effect (list of str, optional, default=None) – Names of the causal effect in data. If effect_array is not None, then effect will be ignored.
effect_array (numpy.ndarray, default=None) – The causal effect that waited to be interpreted by the
PolicyInterpreter
. If this is not provided, then effect can not be None.
- Returns:
Fitted PolicyInterpreter
- Return type:
instance of PolicyInterpreter
- interpret(*, data=None)
Interpret the fitted model in the test data.
- Parameters:
data (pandas.DataFrame, optional, default=None) – The test data in the form of the DataFrame. The model will only use this if v is set as None. In this case, if data is also None, then the data used for training will be used.
- Returns:
The interpreted results for all examples.
- Return type:
dict
- plot(*, feature_names=None, max_depth=None, class_names=None, label='all', filled=False, node_ids=False, proportion=False, rounded=False, precision=3, ax=None, fontsize=None)
Plot the tree model. The sample counts that are shown are weighted with any sample_weights that might be present. The visualization is fit automatically to the size of the axis. Use the
figsize
ordpi
arguments ofplt.figure
to control the size of the rendering.- Returns:
List containing the artists for the annotation boxes making up the tree.
- Return type:
annotations : list of artists
Why: An All-in-One Causal Learning API
Want to use YLearn in a much easier way? Try the all-in-one API Why!
Why is an API which encapsulates almost everything in YLearn, such as identifying causal effects and scoring a trained estimator model. It provides to users a simple and efficient way to use our package: one can directly pass the only thing you have, the data, into Why and call various methods of it rather than learning multiple concepts such as adjustment set before being able to find interesting information hidden in your data. Why is designed to enable the full-pipeline of causal inference: given data, it first tries to discover the causal graph if not provided, then it attempts to find possible variables as treatments and identify the causal effects, after which a suitable estimator model will be trained to estimate the causal effects, and, finally, the policy is evaluated to suggest the best option for each individual.

Why can help almost every part of the whole pipeline of causal inference.
Example usages
In this chapter, we use dataset california_housing to show how to use Why. We prepare the dataset with code below:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing(as_frame=True)
data = housing.frame
outcome = housing.target_names[0]
data[outcome] = housing.target
The variable data is our prepared dataset.
Fit Why with default settings
The simplest way to use Why is creating Why instance with default settings and fit it with training data and outcome name only.
from ylearn import Why
why = Why()
why.fit(data, outcome)
print('identified treatment:',why.treatment_)
print('identified adjustment:',why.adjustment_)
print('identified covariate:',why.covariate_)
print('identified instrument:',why.instrument_)
print(why.causal_effect())
Outputs:
identified treatment: ['MedInc', 'HouseAge']
identified adjustment: None
identified covariate: ['AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
identified instrument: None
mean min max std
MedInc 0.411121 -0.198831 1.093134 0.064856
HouseAge -0.000385 -0.039162 0.114263 0.005845
Fit Why with customized treatments
We can fit Why with argument treatment to specify the desired features as treatment.
from ylearn import Why
why = Why()
why.fit(data, outcome, treatment=['AveBedrms', ])
print('identified treatment:',why.treatment_)
print('identified adjustment:',why.adjustment_)
print('identified covariate:',why.covariate_)
print('identified instrument:',why.instrument_)
print(why.causal_effect())
Outputs:
identified treatment: ['AveBedrms']
identified adjustment: None
identified covariate: ['MedInc', 'HouseAge', 'AveRooms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
identified instrument: None
mean min max std
AveBedrms 0.197422 -0.748971 10.857963 0.169682
Identify treatment without fitting Why
We can call Why’s method identify to identify treatment, adjustment, covariate and instrument without fitting it.
why = Why()
r=why.identify(data, outcome)
print('identified treatment:',r[0])
print('identified adjustment:',r[1])
print('identified covariate:',r[2])
print('identified instrument:',r[3])
Outputs:
identified treatment: ['MedInc', 'HouseAge']
identified adjustment: None
identified covariate: ['AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
identified instrument: None
Class Structures
- class ylearn._why.Why(discrete_outcome=None, discrete_treatment=None, identifier='auto', identifier_options=None, estimator='auto', estimator_options=None, random_state=None)
An all-in-one API for causal learning.
- Parameters:
discrete_outcome (bool, default=None) – If True, force the outcome as discrete; If False, force the outcome as continuous; If None, inferred from outcome.
discrete_treatment (bool, default=None) – If True, force the treatment variables as discrete; If False, force the treatment variables as continuous; if None, inferred from the first treatment
identifier (str or Identifier, default=auto') – If str, available options: ‘auto’ or ‘discovery’ or ‘gcastle’ or ‘pgm’
identifier_options (dict, optional, default=None) – Parameters (key-values) to initialize the identifier
estimator (str, optional, default='auto') – Name of a valid EstimatorModel. One can also pass an instance of a valid estimator model.
estimator_options (dict, optional, default=None) – Parameters (key-values) to initialize the estimator model
fn_cost (callable, optional, default=None) – Cost function, used to readjust the causal effect based on cost.
effect_name (str, default='effect') – The column name in the argument DataFrame passed to fn_cost. Effective when fn_cost is not None.
random_state (int, optional, default=None) – Random state seed
- feature_names_in_
list of feature names seen during fit
- outcome_
name of outcome
- treatment_
list of treatment names identified during fit
- adjustment_
list of adjustment names identified during fit
- covariate_
list of covariate names identified during fit
- instrument_
list of instrument names identified during fit
- identifier_
identifier object or None. Used to identify treatment/adjustment/covariate/instrument if they were not specified during fit
- y_encoder_
LabelEncoder object or None. Used to encode outcome if it is discrete.
- preprocessor_
Pipeline object to preprocess data during fit
- estimators_
estimators dict for each treatment where key is the treatment name and value is the EstimatorModel object
- fit(data, outcome, *, treatment=None, adjustment=None, covariate=None, instrument=None, treatment_count_limit=None, copy=True, **kwargs)
Fit the Why object, steps:
encode outcome if its dtype is not numeric
identify treatment and adjustment/covariate/instrument
encode treatment if discrete_treatment is True
preprocess data
fit causal estimators
- Parameters:
data (pandas.DataFrame, required) – Training dataset.
outcome (str, required) – Name of the outcome.
treatment (list of str, optional) – Names of the treatment. If str, will be split into list with comma; if None, identified by identifier.
adjustment (list of str, optional, default=None) – Names of the adjustment. Identified by identifier if adjustment/covariate/instrument are all None.
covariate (list of str, optional, default=None) – Names of the covariate. Identified by identifier if adjustment/covariate/instrument are all None.
instrument (list of str, optional, default=None) – Names of the instrument. Identified by identifier if adjustment/covariate/instrument are all None.
treatment_count_limit (int, optional) – maximum treatment number, default min(5, 10% of total feature number).
copy (bool, default=True) – Set False to perform inplace transforming and avoid a copy of data.
- Returns:
The fitted
Why
.- Return type:
instance of
Why
- identify(data, outcome, *, treatment=None, adjustment=None, covariate=None, instrument=None, treatment_count_limit=None)
Identify treatment and adjustment/covariate/instrument without fitting Why.
- Parameters:
data (pandas.DataFrame, required) – Training dataset.
outcome (str, required) – Name of the outcome.
treatment (list of str, optional) – Names of the treatment. If str, will be split into list with comma; if None, identified by identifier.
adjustment (list of str, optional, default=None) – Names of the adjustment. Identified by identifier if adjustment/covariate/instrument are all None.
covariate (list of str, optional, default=None) – Names of the covariate. Identified by identifier if adjustment/covariate/instrument are all None.
instrument (list of str, optional, default=None) – Names of the instrument. Identified by identifier if adjustment/covariate/instrument are all None.
treatment_count_limit (int, optional) – maximum treatment number, default min(5, 10% of the number of features).
- Returns:
tuple of identified treatment, adjustment, covariate, instrument
- Rtypes:
tuple
- causal_graph()
Get identified causal graph.
- Returns:
Identified causal graph
- Return type:
instance of
CausalGraph
- causal_effect(test_data=None, treatment=None, treat=None, control=None, target_outcome=None, quantity='ATE', return_detail=False, **kwargs)
Estimate the causal effect.
- Parameters:
test_data (pandas.DataFrame, optional) – The test data to evaluate the causal effect. If None, the training data is used.
treatment (str or list, optional) – Treatment names, should be subset of attribute treatment_, default all elements in attribute treatment_
treat (treatment value or list or ndarray or pandas.Series, default None) – In the case of single discrete treatment, treat should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list where treat[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, list([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, treat should be a float or a ndarray or pandas.Series, by default None
control (treatment value or list or ndarray or pandas.Series, default None) – This is similar to the cases of treat, by default None
target_outcome (outcome value, optional) – Only effective when the outcome is discrete. Default the last one in attribute y_encoder_.classes_.
quantity (str, optional, default 'ATE', optional) – ‘ATE’ or ‘ITE’, default ‘ATE’.
return_detail (bool, default False) – If True, return effect details in result.
kwargs (dict, optional) – Other options to call estimator.estimate().
- Returns:
- causal effect of each treatment. When quantity=’ATE’, the result DataFrame columns are:
mean: mean of causal effect,
min: minimum of causal effect,
max: maximum of causal effect,
detail (if return_detail is True ): causal effect ndarray;
in the case of discrete treatment, the result DataFrame indices are multiindex of (treatment name and treat_vs_control); in the case of continuous treatment, the result DataFrame indices are treatment names. When quantity=’ITE’, the result DataFrame are individual causal effect of each treatment, in the case of discrete treatment, the result DataFrame columns are multiindex of (treatment name and treat_vs_control); in the case of continuous treatment, the result DataFrame columns are treatment names.
- Return type:
pandas.DataFrame
- individual_causal_effect(test_data, control=None, target_outcome=None)
Estimate the causal effect for each individual.
- Parameters:
test_data (pandas.DataFrame, required) – The test data to evaluate the causal effect.
control (treatment value or list or ndarray or pandas.Series, default None) – In the case of single discrete treatment, control should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list where control[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, list([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, treat should be a float or a ndarray or pandas.Series, by default None
target_outcome (outcome value, optional) – Only effective when the outcome is discrete. Default the last one in attribute y_encoder_.classes_.
- Returns:
individual causal effect of each treatment. The result DataFrame columns are the treatment names; In the case of discrete treatment, the result DataFrame indices are multiindex of (individual index in test_data, treatment name and treat_vs_control); in the case of continuous treatment, the result DataFrame indices are multiindex of (individual index in test_data, treatment name).
- Return type:
pandas.DataFrame
- whatif(test_data, new_value, treatment=None)
Get counterfactual predictions when treatment is changed to new_value from its observational counterpart.
- Parameters:
test_data (pandas.DataFrame, required) – The test data to predict.
new_value (ndarray or pd.Series, required) – It should have the same length with test_data.
treatment (str, default None) – Treatment name. If str, it should be one of the fitted attribute treatment_. If None, the first element in the attribute treatment_ is used.
- Returns:
The counterfactual prediction
- Return type:
pandas.Series
- score(test_data=None, treat=None, control=None, scorer='auto')
Scoring the fitted estimator models.
- Parameters:
test_data (pandas.DataFrame, required) – The test data to score.
treat (treatment value or list or ndarray or pandas.Series, default None) – In the case of single discrete treatment, treat should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, treat should be a list where treat[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, list([‘run’, ‘read’]) means the treat value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, treat should be a float or a ndarray or pandas.Series, by default None
control (treatment value or list or ndarray or pandas.Series) – This is similar to the cases of treat, by default None
scorer (str, default 'auto') – Reserved.
- Returns:
Score of the estimator models
- Return type:
float
- policy_interpreter(test_data, treatment=None, control=None, target_outcome=None, **kwargs)
Get the policy interpreter
- Parameters:
test_data (pandas.DataFrame, required) – The test data to evaluate.
treatment (str or list, optional) – Treatment names, should be one or two element. default the first two elements in attribute treatment_
control (treatment value or list or ndarray or pandas.Series) – In the case of single discrete treatment, control should be an int or str of one of all possible treatment values which indicates the value of the intended treatment; in the case of multiple discrete treatment, control should be a list where control[i] indicates the value of the i-th intended treatment, for example, when there are multiple discrete treatments, list([‘run’, ‘read’]) means the control value of the first treatment is taken as ‘run’ and that of the second treatment is taken as ‘read’; in the case of continuous treatment, control should be a float or a ndarray or pandas.Series, by default None
target_outcome (outcome value, optional) – Only effective when the outcome is discrete. Default the last one in attribute y_encoder_.classes_.
kwargs (dict) – options to initialize the PolicyInterpreter.
- Returns:
The fitted instance of
PolicyInterpreter
.- Return type:
instance of
PolicyInterpreter
- uplift_model(test_data, treatment=None, treat=None, control=None, target_outcome=None, name=None, random=None)
Get uplift model over one treatment.
- Parameters:
test_data (pandas.DataFrame, required) – The test data to evaluate.
treatment (str or list, optional) – Treatment name. If str, it should be one of the fitted attribute treatment_. If None, the first element in the attribute treatment_ is used.
treat (treatment value, optional) – If None, the last element in the treatment encoder’s attribute classes_ is used.
control (treatment value, optional) – If None, the first element in the treatment encoder’s attribute classes_ is used.
target_outcome (outcome value, optional) – Only effective when the outcome is discrete. Default the last one in attribute y_encoder_.classes_.
name (str) – Lift name. If None, treat value is used.
random (str, default None) – Lift name for random generated data. if None, no random lift is generated.
- Returns:
The fitted instance of
UpliftModel
.- Return type:
instance of
UpliftModel
- plot_causal_graph()
Plot the causal graph.
- plot_policy_interpreter(test_data, treatment=None, control=None, **kwargs)
Plot the interpreter.
- Returns:
The fitted instance of
PolicyInterpreter
.- Return type:
instance of
PolicyInterpreter
References
Pearl. Causility : models, reasoning, and inference.
Shpitser and J. Pearl. Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models.
Neal. Introduction to Causal Inference.
Funk, et al. Doubly Robust Estimation of Causal Effects.
Chernozhukov, et al. Double Machine Learning for Treatment and Causal Parameters. arXiv:1608.00060.
Athey and G. Imbens. Recursive Partitioning for Heterogeneous Causal Effects. arXiv: 1504.01132.
Schuler, et al. A comparison of methods for model selection when estimating individual treatment effects. arXiv:1804.05146.
X. Nie, et al. Quasi-Oracle estimation of heterogeneous treatment effects. arXiv: 1712.04912.
Hartford, et al. Deep IV: A Flexible Approach for Counterfactual Prediction. ICML 2017.
Newey and J. Powell. Instrumental Variable Estimation of Nonparametric Models. Econometrica 71, no. 5 (2003): 1565–78.
Kunzel2019, et al. Meta-Learners for Estimating Heterogeneous Treatment Effects using Machine Learning.
Angrist, et al. Identification of causal effects using instrumental variables. Journal of the American Statistical Association.
Athey and S. Wager. Policy Learning with Observational Data. arXiv: 1702.02896.
Spirtes, et al. Causation, Prediction, and Search.
Zheng, et al. DAGs with NO TEARS: Continuous Optimization for Structure Learning. arXiv: 1803.01422.
Authey, et al. Generalized Random Forests. arXiv: 1610.01271