Exercise Class 10
Exercise 1 - Causal Inference Warm-Up
Recall the usual Causal Inference setup from the last lecture. We consider a treatment variable \(D\), outcome \(Y\), and possibly a vector of covariates \(X\); all random variables from some distribution \((Y, D, X) \sim P\). As our data we consider a random sample of \(n\) units i.e. \(\{(Y_{i}, D_{i}, X_{i})\}_{i=1}^{n}\).
We define unit \(i\)’s potential outcomes under two hypothetical intervations as \(Y_{i}(1), Y_{i}(0)\).
The observed outcome \(Y_{i}\) can be written in terms of the potential outcomes as \[\begin{align} Y_{i} & = \begin{cases} Y_{i}(1), & D_{i} = 1 \\ Y_{i}(0), & D_{i} = 0 \end{cases} \\ & = Y_{i}(0) + D_{i}\{Y_{i}(1) - Y_{i}(0)\}. \end{align} \tag{1}\]
To inspect the effect of the treatment \(D\) on our outcome \(Y\), we could compare the averages of \(Y\) across \((D = 1)\) and \((D = 0)\) i.e. the naive comparison \(E[Y_{i} \mid D_{i} = 1] - E[Y_{i} \mid D_{i} = 0]\).
Show how we can go from the naive comparison to the following expression from the slides:
\[\begin{align*} & E[Y_{i} \mid D_{i} = 1] - E[Y_{i} \mid D_{i} = 0] \\ & = \underbrace{ E[Y_{i}(1) - Y_{i}(0) \mid D_{i} = 1] }_{ATT} + \underbrace{ E[Y_{i}(0) \mid D_{i} = 1] - E[Y_{i}(0) \mid D_{i} = 0] }_{\text{Selection bias}}. \end{align*}\]
Assume the potential outcomes \(Y_{i}(1), Y_{i}(0)\) are independent of the treatment \(D_{i}\): \[\begin{align*} Y(1), Y(0) \perp\!\!\!\perp D. \end{align*} \tag{2}\]
- Why is the selection bias equal to \(0\) under this assumption?
- What does the naive comparison equal under this assumption?
- How does this relate to the concept of identification?
- What is the naive comparison also equal to because of Equation 2?
Assume instead the potential outcomes \(Y_{i}(1), Y_{i}(0)\) are independent of the treatment \(D_{i}\) conditional on \(X_{i}\):
\[\begin{align*} Y(1), Y(0) \perp\!\!\!\perp D \mid X. \end{align*} \tag{3}\]
Show how the ATE, \(E[Y_{i}(1) - Y_{i}(0)]\), can be identified as:
\[\begin{align*} E[Y_{i}(1) - Y_{i}(0)] = E\{ E[Y_{i}\mid D_{i} = 1, X_{i}] - E[Y_{i}\mid D_{i} = 0, X_{i}] \} \end{align*} \tag{4}\]
using Equation 1 and Equation 3.
Consider the table of potential outcomes
\(i\) \(Y_i(1)\) \(Y_i(0)\) \(D_i\) \(1\) \(Y_1(1)\) \(Y_1(0)\) \(D_1\) \(2\) \(Y_2(1)\) \(Y_2(0)\) \(D_2\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(n\) \(Y_n(1)\) \(Y_n(0)\) \(D_n\) represented in Python with a concrete dataset:
df = pl.DataFrame( [ [1, 3, 2, 0], [2, 4, 2, 0], [3, 5, 2, 0], [4, 6, 2, 1], [5, 13, 3, 1], [5, 10, 6, 1], ], schema=["id", "Y(1)", "Y(0)", "D"], orient="row", )Let \(\tau_{i} = Y_{i}(1) - Y_{i}(0)\) be the treatment effect for the \(i\)th individual.
- Compute the treatment effect for individal \(i = 3\)
- Compute the average treatment effect on the treated (\(\text{ATT}\))
- Compute the average treatment effect (\(\text{ATE}\))
- Is this dataset realistic?
Exercise 2 - Estimators
The data for this exercise can be downloaded here.
In the following, our estimand of interest is the ATE:
\[\begin{align*} ATT := E[Y_{i}(1) - Y_{i}(0)]. \end{align*} \tag{5}\]
Consider the linear model:
\[\begin{align*} Y_{i} = \beta_{0} + \beta_{1} D_{i} + \beta_{2}X_{i} + \varepsilon_{i}, \quad i = 1, \ldots, n. \end{align*}\]
Estimate \(\beta_1\) using OLS. Does it estimate Equation 5?
You can start from the snippet:
import statsmodels.formula.api as smf # Assuming loaded data is named `df` m = smf.ols("Y ~ D + X", data=df).fit()Note: you might have to install
statmodelsEstimate separate regressions for the treated and control group using:
md0 = smf.ols("Y ~ D + X", data=df.filter(pl.col("D").eq(0))).fit() md1 = smf.ols("Y ~ D + X", data=df.filter(pl.col("D").eq(1))).fit()These two models estimate the functions
\[\begin{align*} m_0(x) = E[Y \mid D = 0, X = x], \qquad m_1(x) = E[Y \mid D = 1, X = x]. \end{align*}\]
Estimate the ATE with plug-in estimator
\[\begin{align*} \hat\tau_{\text{plugin}} = \frac{1}{n}\sum_i \left\{ \hat m_1(X_i) - \hat m_0(X_i) \right\} \end{align*}\]
using the fitted models.
When predicting using
md0you should set \(D = 0\) for all individuals in the data. Similarly, when predicting usingmd1you should set \(D = 1\) for all individuals in the data. The plug-in estimate is the average of the difference between the two predicted vectors.Tip: You can predict on the transformed data using the snippet below (for the case of \(D = 0\)):
md0.predict(df.with_columns(D=pl.lit(0)))The same estimate can be computed assuming the model
\[\begin{align*} Y_{i} = \beta_{0} + \beta_{1} D_{i} + \gamma \bar{X}_{i} + \delta D_i \bar{X}_{i} + \varepsilon_{i}, \quad i = 1, \ldots, n, \end{align*}\]
where \(\bar{X}_{i}\) is the sample average of \(X_{i}\).
Estimate \(\beta_{1}\) and compare it to the plug-in estimate from 3.
Finally, estimate the following naive model:
\[\begin{align*} Y_{i} = \beta_{0} + \beta_{1} D_{i} + \varepsilon_{i}, \quad i = 1, \ldots, n. \end{align*}\]
using OLS.
Compare all the estimates from the previous exercises. Relate your estimates to the conditional independence assumption, Equation 3.
Exercise 3 - Lalonde
In this exercise we will use data from “Evaluating the Econometric Evaluations of Training Programs”, American Economic Review, Vol. 76, pp. 604-620 by Robert Lalonde.
The data for this exercise can be downloaded here.
Install the causalinference package using
uv.Estimate the
ATEusingcausalinferenceby:Install the R package grf.
Yes in R. If you don’t have an R installation, ask your TA.
Estimate:
- A causal forest settings
W.hat = 0.5(no propensity score estimated). - A causal forest settings without setting
W.hat(propensity score estimated).
- A causal forest settings
What are the variable importances for the estimated heterogenous treatment effects?
Try compute a best linear projection
\[ \tau(X) = \beta_0 + \beta A\]
using the function
best_linear_projection.Is there any sign of significant heterogenous effects?
See the tutorial here