Exercise Class 10

Author

Jonas Skjold Raaschou-Pedersen

Published

November 13, 2025


Exercise 1 - Causal Inference Warm-Up

  1. Recall the usual Causal Inference setup from the last lecture. We consider a treatment variable \(D\), outcome \(Y\), and possibly a vector of covariates \(X\); all random variables from some distribution \((Y, D, X) \sim P\). As our data we consider a random sample of \(n\) units i.e. \(\{(Y_{i}, D_{i}, X_{i})\}_{i=1}^{n}\).

    We define unit \(i\)’s potential outcomes under two hypothetical intervations as \(Y_{i}(1), Y_{i}(0)\).

    The observed outcome \(Y_{i}\) can be written in terms of the potential outcomes as \[\begin{align} Y_{i} & = \begin{cases} Y_{i}(1), & D_{i} = 1 \\ Y_{i}(0), & D_{i} = 0 \end{cases} \\ & = Y_{i}(0) + D_{i}\{Y_{i}(1) - Y_{i}(0)\}. \end{align} \tag{1}\]

    To inspect the effect of the treatment \(D\) on our outcome \(Y\), we could compare the averages of \(Y\) across \((D = 1)\) and \((D = 0)\) i.e. the naive comparison \(E[Y_{i} \mid D_{i} = 1] - E[Y_{i} \mid D_{i} = 0]\).

    1. Show how we can go from the naive comparison to the following expression from the slides:

      \[\begin{align*} & E[Y_{i} \mid D_{i} = 1] - E[Y_{i} \mid D_{i} = 0] \\ & = \underbrace{ E[Y_{i}(1) - Y_{i}(0) \mid D_{i} = 1] }_{ATT} + \underbrace{ E[Y_{i}(0) \mid D_{i} = 1] - E[Y_{i}(0) \mid D_{i} = 0] }_{\text{Selection bias}}. \end{align*}\]

    2. Assume the potential outcomes \(Y_{i}(1), Y_{i}(0)\) are independent of the treatment \(D_{i}\): \[\begin{align*} Y(1), Y(0) \perp\!\!\!\perp D. \end{align*} \tag{2}\]

      1. Why is the selection bias equal to \(0\) under this assumption?
      2. What does the naive comparison equal under this assumption?
      3. How does this relate to the concept of identification?
      4. What is the naive comparison also equal to because of Equation 2?
    3. Assume instead the potential outcomes \(Y_{i}(1), Y_{i}(0)\) are independent of the treatment \(D_{i}\) conditional on \(X_{i}\):

      \[\begin{align*} Y(1), Y(0) \perp\!\!\!\perp D \mid X. \end{align*} \tag{3}\]

      1. Show how the ATE, \(E[Y_{i}(1) - Y_{i}(0)]\), can be identified as:

        \[\begin{align*} E[Y_{i}(1) - Y_{i}(0)] = E\{ E[Y_{i}\mid D_{i} = 1, X_{i}] - E[Y_{i}\mid D_{i} = 0, X_{i}] \} \end{align*} \tag{4}\]

        using Equation 1 and Equation 3.

  2. Consider the table of potential outcomes

    \(i\) \(Y_i(1)\) \(Y_i(0)\) \(D_i\)
    \(1\) \(Y_1(1)\) \(Y_1(0)\) \(D_1\)
    \(2\) \(Y_2(1)\) \(Y_2(0)\) \(D_2\)
    \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
    \(n\) \(Y_n(1)\) \(Y_n(0)\) \(D_n\)

    represented in Python with a concrete dataset:

    df = pl.DataFrame(
        [
            [1, 3, 2, 0],
            [2, 4, 2, 0],
            [3, 5, 2, 0],
            [4, 6, 2, 1],
            [5, 13, 3, 1],
            [5, 10, 6, 1],
        ],
        schema=["id", "Y(1)", "Y(0)", "D"],
        orient="row",
    )

    Let \(\tau_{i} = Y_{i}(1) - Y_{i}(0)\) be the treatment effect for the \(i\)th individual.

    1. Compute the treatment effect for individal \(i = 3\)
    2. Compute the average treatment effect on the treated (\(\text{ATT}\))
    3. Compute the average treatment effect (\(\text{ATE}\))
    4. Is this dataset realistic?

Exercise 2 - Estimators

The data for this exercise can be downloaded here.

In the following, our estimand of interest is the ATE:

\[\begin{align*} ATT := E[Y_{i}(1) - Y_{i}(0)]. \end{align*} \tag{5}\]

  1. Consider the linear model:

    \[\begin{align*} Y_{i} = \beta_{0} + \beta_{1} D_{i} + \beta_{2}X_{i} + \varepsilon_{i}, \quad i = 1, \ldots, n. \end{align*}\]

    Estimate \(\beta_1\) using OLS. Does it estimate Equation 5?

    You can start from the snippet:

    import statsmodels.formula.api as smf
    
    
    # Assuming loaded data is named `df`
    m = smf.ols("Y ~ D + X", data=df).fit()

    Note: you might have to install statmodels

  2. Estimate separate regressions for the treated and control group using:

    md0 = smf.ols("Y ~ D + X", data=df.filter(pl.col("D").eq(0))).fit()
    md1 = smf.ols("Y ~ D + X", data=df.filter(pl.col("D").eq(1))).fit()

    These two models estimate the functions

    \[\begin{align*} m_0(x) = E[Y \mid D = 0, X = x], \qquad m_1(x) = E[Y \mid D = 1, X = x]. \end{align*}\]

  3. Estimate the ATE with plug-in estimator

    \[\begin{align*} \hat\tau_{\text{plugin}} = \frac{1}{n}\sum_i \left\{ \hat m_1(X_i) - \hat m_0(X_i) \right\} \end{align*}\]

    using the fitted models.

    When predicting using md0 you should set \(D = 0\) for all individuals in the data. Similarly, when predicting using md1 you should set \(D = 1\) for all individuals in the data. The plug-in estimate is the average of the difference between the two predicted vectors.

    Tip: You can predict on the transformed data using the snippet below (for the case of \(D = 0\)):

    md0.predict(df.with_columns(D=pl.lit(0)))
  4. The same estimate can be computed assuming the model

    \[\begin{align*} Y_{i} = \beta_{0} + \beta_{1} D_{i} + \gamma \bar{X}_{i} + \delta D_i \bar{X}_{i} + \varepsilon_{i}, \quad i = 1, \ldots, n, \end{align*}\]

    where \(\bar{X}_{i}\) is the sample average of \(X_{i}\).

    Estimate \(\beta_{1}\) and compare it to the plug-in estimate from 3.

  5. Finally, estimate the following naive model:

    \[\begin{align*} Y_{i} = \beta_{0} + \beta_{1} D_{i} + \varepsilon_{i}, \quad i = 1, \ldots, n. \end{align*}\]

    using OLS.

  6. Compare all the estimates from the previous exercises. Relate your estimates to the conditional independence assumption, Equation 3.

Exercise 3 - Lalonde

In this exercise we will use data from “Evaluating the Econometric Evaluations of Training Programs”, American Economic Review, Vol. 76, pp. 604-620 by Robert Lalonde.

The data for this exercise can be downloaded here.

  1. Install the causalinference package using uv.

  2. Estimate the ATE using causalinference by:

    1. OLS, see here

    2. Matching, see here

      1. Set bias_adj=True and do it again.
      2. Set bias_adj=True and try varying the matches parameter. How do the estimates change as a function of it?
  3. Install the R package grf.

    Yes in R. If you don’t have an R installation, ask your TA.

  4. Estimate:

    1. A causal forest settings W.hat = 0.5 (no propensity score estimated).
    2. A causal forest settings without setting W.hat (propensity score estimated).
  5. What are the variable importances for the estimated heterogenous treatment effects?

  6. Try compute a best linear projection

    \[ \tau(X) = \beta_0 + \beta A\]

    using the function best_linear_projection.

    Is there any sign of significant heterogenous effects?

    See the tutorial here