Quick Start
In this part, we first show several simple example usages of YLearn. These examples cover the most common functionalities. Then we present a case study with Why
to unveil the hidden
causal relations in data.
Example usages
We present several necessary example usages of YLearn in this section, which covers defining a causal graph, identifying the causal effect, and training an estimator model, etc. Please see their specific documentations to for more details.
Representation of causal graph
Given a set of variables, the representation of its causal graph in YLearn requires a python
dict
to denote the causal relations of variables, in which the keys of thedict
are children of all elements in the corresponding values which usually should be a list of names of variables. For an instance, in the simplest case, for a given causal graph \(X \leftarrow W \rightarrow Y\), we first define a pythondict
for the causal relations, which will then be passed toCausalGraph
as a parameter:causation = {'X': ['W'], 'W':[], 'Y':['W']} cg = CausalGraph(causation=causation)
cg
will be the causal graph encoding the causal relation \(X \leftarrow W \rightarrow Y\) in YLearn. If there exist unobserved confounders in the causal graph, then, aside from the observed variables, we should also define a pythonlist
containing these causal relations. See Causal Graph for more details.Identification of causal effect
It is crucial to identify the causal effect when we want to estimate it from data. The first step for identifying the causal effect is identifying the causal estimand. This can be easily done in YLearn. For an instance, suppose that we are interested in identifying the causal estimand \(P(Y|do(X=x))\) in the causal graph cg, then we should first define an instance of
CausalModel
and call theidentify()
method:cm = CausalModel(causal_graph=cg) cm.identify(treatment={'X'}, outcome={'Y'}, identify_method=('backdoor', 'simple'))
where we use the backdoor-adjustment method here. YLearn also support front-door adjustment, finding instrumental variables, and, most importantly, the general identification method developed in [Pearl] which is able to identify any causal effect if it is identifiable.
Estimation of causal effect
The estimation of causal effects in YLearn is also fairly easy. It follows the common approach of deploying a machine learning model since YLearn focuses on the intersection of machine learning and causal inference in this part. Given a dataset, one can apply any
EstimatorModel
in YLearn with a procedure composed of 3 steps:Given data in the form of
pandas.DataFrame
, find the names of treatment, outcome, adjustment, covariate.Call
fit()
method ofEstimatorModel
to train the model.Call
estimate()
method ofEstimatorModel
to estimate causal effects in test data.
See Estimator Model: Estimating the Causal Effects for more details.
Using the all-in-one API: Why
For the purpose of applying YLearn in a unified and eaiser manner, YLearn provides the API
Why
.Why
is an API which encapsulates almost everything in YLearn, such as identifying causal effects and scoring a trained estimator model. To useWhy
, one should first create an instance ofWhy
which needs to be trained by calling its methodfit()
, after which other utilities, such ascausal_effect()
,score()
, andwhatif()
, could be used. This procedure is illustrated in the following code example:from sklearn.datasets import fetch_california_housing from ylearn import Why housing = fetch_california_housing(as_frame=True) data = housing.frame outcome = housing.target_names[0] data[outcome] = housing.target why = Why() why.fit(data, outcome, treatment=['AveBedrms', 'AveRooms']) print(why.causal_effect())