Exercise Class 8

Author

Jonas Skjold Raaschou-Pedersen

Published

October 30, 2025

There are no new coding exercises this week. If you still have exercises remaining from last week, you are encouraged to use part of the class time to work on them.


Exercise - Kleinberg et al. (2018)

This exercise draws on Kleinberg et al. (2018, QJE).

Recall the context of the paper:

  • Data on arrests in New York City
  • Shortly after arrest the defendants appear at a bail hearing
    • These hearings are not intended to determine if the person is guilty
  • A judge decides at the bail hearing to either release or detain the defendant for the pretrial period
    • The judge makes an implicit prediction when deciding this. One could imagine the judge asks herself:
      • What will the defendant do if released?
      • Will they flee or commit a new crime?
    • A judge must trade of these risks against the cost of incarceration
  • For the released defendants we observe:
    1. Whether they fail to appear at a subsequent court hearing prior to adjudication of their case (FTA)
    2. Whether they were rearrested prior to adjudication (and the specific crime committed)
  • For the detained defendants we don’t observe any further outcomes
  • The main outcome used in the paper is FTA which (Kleinberg et al. 2018) denote by “crime”
  • The paper asks whether an algorithm \(m(\cdot)\) trained only on data available to the judge at the bail hearing can improve upon the predictions made by the judges

With the above context we define:

  • defendant \(i\)
  • judge \(j\)
  • cell \(c\), defined as a pair of borough, year, month, and day of week, with at least five judges
  • \(R_{ijc}\) is a binary indicator for whether judge \(j\) in cell \(c\) released defendant \(i\) at the bail hearing
  • \(\rho^{j}_{ic}\) is judge \(j\)’s release rule
  • \(Y_{ijc} \in \{0, 1\}\) a binary indicator for whether defendant \(i\) (assigned to judge \(j\) in cell \(c\)) failed to appear in court (FTA)
  • \(X_{ijc} \in \mathbb{R}^{p}\) a vector of of characteristics of defendant \(i\)’s current case, their prior criminal record, and age (no demographic variables)
  • We will subsume the subscripts \(j\) and \(c\) in the following and just write \(Y_{i}\), \(X_{i}\), \(\rho^{j}_{i}\) while remembering that each individual and judge is located inside a cell \(c\)

Some important points from the paper:

  • The paper estimates \(m(X) = P(Y = 1 \mid X)\) (an algorithm) on a training split of the data using a gradient boosting model
  • The predicted probability \(m(X_{i})\) is the crime risk of defendant \(i\)
  • They use the estimated algorithm to predict \(m(X_{i})\) for each defendant on a test split
  • The predictions are then compared to actual judge decisions and defendant outcomes
  • In particular, they simulate counterfactual policies where detention is determined by the algorithm’s risk predictions (e.g. detain if \(m(X_{i}) \geq k\)), and then compare the resulting jail rates and crime rates to those observed under actual judge decisions.
  • The paper leverages a quasi-random assignment of defendants to judges and differences in judge leniency within cells for their main results

Note: the description above skips many of the important details of the paper related to their identification strategy and why the comparisons that they make are meaningful and valid; see the paper and online appendix for the exact details.


  1. Why is the problem in Kleinberg et al. (2018) a Prediction Policy Problem as of Kleinberg et al. (2015)?
  2. With the above setup, what do Kleinberg et al. (2018) mean by there being a Selective Labels Problem? Use the variables \(Y_{i}\), \(X_{i}\), \(\rho^{j}_{i}\) and the context + setup to motivate your answer.
  3. Why is the quasi-random assignment of defendants to judges important for the analysis in Kleinberg et al. (2018)? Explain how judge leniency (judge release rate) varies across judges and how random assignment of defendants within a cell allows the authors to test for the presence of unobservables in judges’ decisions. Why does this matter for the “selective labels” problem and for comparing human and machine predictions?
  4. Explain Figure V and Table III of Kleinberg et al. (2018). In particular, answer the following questions:
    • What does the point 2nd Quintile in the figure indicate?
    • What do the two arrows pointing outwards from the above point to the solid line indicate? And how does this relate to Table III?
    • Relate the figure and table to the release rule given in the paper: \[ \text{Release if and only if } \rho^{1} = 1 \text{ and } m(X) < k. \] Interpret this rule in the setting with five judge quintiles, where judges are grouped by their leniency, i.e. by how frequently they release defendants.
    • What are the economic and social consequences of the figure? Compute the values in the first row of column 3 and 4 of Table III and relate your answer to this.
    • Why is the solid line sometimes called a contraction curve and how does this relate to the above-mentioned release rule?