Estimator Model: Estimating the Causal Effects

For a causal effect with \(do\)-operator, after converting it into the corresponding statistical estimand with the approach called Identification, the task of causal inference now becomes estimating the statistical estimand, the converted causal effect. Before diving into any specific estimation methods for causal effects, we briefly introduce the problem settings of the estimation of causal effects.

Problem Setting

It is introduced in Causal Model that every causal structure has a corresponding DAG called causal graph. Furthermore, each child-parent family in a DAG \(G\) represents a deterministic function

\[X_i = F_i (pa_i, \eta_i), i = 1, \dots, n,\]

where \(pa_i\) are parents of \(x_i\) in \(G\) and \(\eta_i\) are random disturbances representing exogeneous not present in the analysis. We call these functions Structural Equation Model related to the causal structures. For a set of variables \(W\) that satisfies the back-door criterion (see Identification), the causal effect of \(X\) on \(Y\) is given by the formula

\[P(y|do(x)) = \sum_w P(y| x, w)P(w).\]

In such case, variables \(X\) for which the above equality is valid are also named “conditionally ignorable given \(W\)” in the potential outcome framework. The set of variables \(W\) satisfying this condition is called adjustment set. And in the language of structural equation model, these relations are encoded by

\[\begin{split}X & = F_1 (W, \epsilon),\\ Y & = F_2 (W, X, \eta).\end{split}\]

Our problems can be expressed with the structural equation model.

Estimator Models

YLearn implements several estimator models for the estimation of causal effects:

The evaluations of

\[\mathbb{E}[F_2(x_1, W, \eta) - F_2(x_0, W, \eta)]\]

in ATE and

\[\mathbb{E}[F_2(x_1, W, V, \eta) - F_2(x_0, W, V, \eta)]\]

in CATE will be the tasks of various suitable estimator models in YLearn. The concept EstimatorModel in YLearn is designed for this purpose.

A typical EstimatorModel should have the following structure:

class BaseEstModel:
    """
    Base class for various estimator model.

    Parameters
    ----------
    random_state : int, default=2022
    is_discrete_treatment : bool, default=False
        Set this to True if the treatment is discrete.
    is_discrete_outcome : bool, default=False
        Set this to True if the outcome is discrete.
    categories : str, optional, default='auto'

    """
    def fit(
        self,
        data,
        outcome,
        treatment,
        **kwargs,
    ):
        """Fit the estimator model.

        Parameters
        ----------
        data : pandas.DataFrame
            The dataset used for training the model

        outcome : str or list of str, optional
            Names of the outcome variables

        treatment : str or list of str
            Names of the treatment variables

        Returns
        -------
        instance of BaseEstModel
            The fitted estimator model.
        """

    def estimate(
        self,
        data=None,
        quantity=None,
        **kwargs
    ):
        """Estimate the causal effect.

        Parameters
        ----------
        data : pd.DataFrame, optional
            The test data for the estimator to evaluate the causal effect, note
            that the estimator directly evaluate all quantities in the training
            data if data is None, by default None

        quantity : str, optional
            The possible values of quantity include:
                'CATE' : the estimator will evaluate the CATE;
                'ATE' : the estimator will evaluate the ATE;
                None : the estimator will evaluate the ITE or CITE, by default None

        Returns
        -------
        ndarray
            The estimated causal effect with the type of the quantity.
        """

    def effect_nji(self, data=None, *args, **kwargs):
        """Return causal effects for all possible values of treatments.

        Parameters
        ----------
        data : pd.DataFrame, optional
            The test data for the estimator to evaluate the causal effect, note
            that the estimator directly evaluate all quantities in the training
            data if data is None, by default None
        """