Exercise Class 7

Author

Jonas Skjold Raaschou-Pedersen

Published

October 23, 2025

This week we start from this repository.


This week’s exercises are about binary classification and fairness. We will use simulated data corresponding to the empirical example in the paper “Equality of Opportunity in Supervised Learning” of Hardt et. al 2016. The simulated data is stored in the csv fico.csv which can be downloaded here or at Absalon here.


Printing out an overview of the data shows:

import polars as pl


df = pl.read_csv("data/fico.csv")
print(df)
shape: (55_000, 3)
┌────────────┬─────┬──────────┐
│ FICO       ┆ ND  ┆ group    │
│ ---        ┆ --- ┆ ---      │
│ f64        ┆ i64 ┆ str      │
╞════════════╪═════╪══════════╡
│ 300.166667 ┆ 0   ┆ Asian    │
│ 300.5      ┆ 0   ┆ Asian    │
│ 300.833333 ┆ 0   ┆ Asian    │
│ 301.166667 ┆ 0   ┆ Asian    │
│ 301.5      ┆ 0   ┆ Asian    │
│ …          ┆ …   ┆ …        │
│ 845.5      ┆ 0   ┆ Hispanic │
│ 846.5      ┆ 1   ┆ Hispanic │
│ 847.5      ┆ 1   ┆ Hispanic │
│ 848.5      ┆ 1   ┆ Hispanic │
│ 849.5      ┆ 1   ┆ Hispanic │
└────────────┴─────┴──────────┘

The data consists of \(55,000\) rows and \(3\) columns:

  1. FICO: A simulated FICO credit score. These scores range from 300 to 850 and try to predict credit risk (a higher score is better i.e. less credit risk).
  2. ND (Non-Default): Indicates whether the individual avoided default — that is, they did not miss payments for 90 days or more on any account during the following 18–24 months.
  3. group: Protected attribute - a group variable indicating the race of the individual; restricted to four values: Asian, White, Black and Hispanic.

As written in Hardt et al. 2016:

FICO scores are complicated proprietary classifiers based on features, like number of bank accounts kept, that could interact with culture - and hence race — in unfair ways.

Hence, you should think of FICO as some complex proprietary function estimated on features that are unavailable to us and our analysis.


Theoretically, we think of our data as the tuple \((Y, R, A) \sim P\) where:

  1. \(Y\) is the outcome variable ND
  2. \(R\) is the credit-score variable FICO
  3. \(A\) is the protected race variable group

The data provided is an iid sample of \(n = 55,000\) observations from this distribution i.e. \(\{(Y_{i}, R_{i}, A_{i})\}_{i=1}^{n}\).

We will study the behavior of a lender under various constraints using the given data. A lender can set a threshold \(r\) and decide whether to give out a loan to an individual based on the individual’s credit-score \(R_{i}\) and the threshold \(r\). The lender will give out a loan to all individuals that have a credit-score above the threshold i.e. \(R_{i} > r\). Setting a threshold corresponds to constructing a classifier based on the score \(R_{i}\):

\[\hat{Y}_{i} := \mathbf{1}\{R_{i} > r\}. \tag{1}\]

Finally, the meaning of true positive, true negative, false positive and false negative in our context for the lender is as follows:

Exercise 1 - Overview of data

  1. Download the data and place it in some folder in your project directory e.g. data/. The data can then be loaded using for instance polars and the snippet provided above.

    Tip: the data can be downloaded directly to the data/ folder using wget as follows:

    wget -P data/ https://gist.githubusercontent.com/jsr-p/5ca222a5e48b20aead6a65fcf142d315/raw/c3ae62168c9e5f7887bda3d4d840b69d0cf079fc/fico.csv

    which you can copy, paste and run in your shell.

  2. Compute the empirical default and non-default rates for the whole sample and conditional on each subgroup in group.

    You should get results similar to:

    shape: (5, 3)
    ┌──────────┬──────────────┬──────────────────┐
    │ group    ┆ Default rate ┆ Non-default rate │
    │ ---      ┆ ---          ┆ ---              │
    │ str      ┆ f64          ┆ f64              │
    ╞══════════╪══════════════╪══════════════════╡
    │ All      ┆ 0.36         ┆ 0.64             │
    │ Asian    ┆ 0.21         ┆ 0.79             │
    │ Black    ┆ 0.64         ┆ 0.36             │
    │ Hispanic ┆ 0.47         ┆ 0.53             │
    │ White    ┆ 0.25         ┆ 0.75             │
    └──────────┴──────────────┴──────────────────┘
  3. As written in Hardt et al. 2016, a credit score cutoff of \(620\) is commonly used for prime-rate loans. Compute the non-default rate for the whole sample and for each group for those with \(R_{i} > r\) and those with \(R_{i} \leq r\) for \(r = 620\).

    You should get results similar to:

    ┌─────────┬──────────┬──────┐
    │ Above r ┆ group    ┆ ND   │
    │ ---     ┆ ---      ┆ ---  │
    │ i8      ┆ str      ┆ f64  │
    ╞═════════╪══════════╪══════╡
    │ 0       ┆ All      ┆ 0.34 │
    │ 0       ┆ Asian    ┆ 0.5  │
    │ 0       ┆ Black    ┆ 0.2  │
    │ 0       ┆ Hispanic ┆ 0.3  │
    │ 0       ┆ White    ┆ 0.43 │
    │ 1       ┆ All      ┆ 0.89 │
    │ 1       ┆ Asian    ┆ 0.9  │
    │ 1       ┆ Black    ┆ 0.78 │
    │ 1       ┆ Hispanic ┆ 0.84 │
    │ 1       ┆ White    ┆ 0.91 │
    └─────────┴──────────┴──────┘
  4. Replicate Figure 7 of Hardt et al. 2016 with the simulated dataset.

    You should get results similar to:

  5. Replicate the second panel of Figure 8 in Hardt et al. 2016 while ignoring the shading. What is the interpretation of the figure and how does it relate to fairness?

    Hint: use the qcut method from either polars or pandas to compute the within-group percentiles categories and the non-default rates inside those.

    You should get results similar to:

Exercise 2 - Binary classification metrics and the ROC curve

  1. Set \(r = 620\) and classify each individual using the rule in Equation 1. With this you should now have a vector of predicted outcomes, e.g. denoted Y_hat, and the vector of true outcomes, e.g. denoted as Y. Use Y_hat and Y to compute the True Positive Rate (TPR), False Positive Rate (FPR), True Negative Rate (TNR), and False Negative Rate (FNR).

    You should get results similar to:

    shape: (1, 4)
    ┌──────┬──────┬──────┬──────┐
    │ TPR  ┆ FNR  ┆ FPR  ┆ TNR  │
    │ ---  ┆ ---  ┆ ---  ┆ ---  │
    │ f64  ┆ f64  ┆ f64  ┆ f64  │
    ╞══════╪══════╪══════╪══════╡
    │ 0.77 ┆ 0.23 ┆ 0.18 ┆ 0.82 │
    └──────┴──────┴──────┴──────┘
  2. Construct a grid of percentile values \(\mathcal{P} := \{0, 0.01, 0.02, \ldots, 0.98, 0.99, 1\}\) e.g. using np.arange. Use these with np.quantile to compute the quantile corresponding to each \(p \in \mathcal{P}\) of FICO; we will use these as thresholds for the classifier.

    Note: you can also just construct a grid from \(300\) to \(850\) as the thresholds.

    Loop over each of threshold values and classify each individual using the rule in Equation 1. Save the resulting metrics for each value and construct a dataframe of all the computed values. You should now have a dataframe of FPR and TPR for each of the thresholds \(r\).

  3. Recall that the ROC curve shows plots the TPR against the FPR of a given classifier for different threshold values \(r\). Use the computed values from the previous exercise to plot the ROC curve corresponding to the FICO score \(R\) for varying tresholds \(r\).

    You should get results similar to:

  4. Repeat 2. and 3. but now compute the ROC curve for each group based on the protected attribute \(A\).

    You should get results similar to:

  5. Make a zoomed-in version of the figure in 4. shown above.

    You should get results similar to:

  6. Optional: Compute as many of the thresholds corresponding to the five constraints on the lender as written below and explained in Hardt et al. 2016:

    1. Max profit
    2. Race blind
    3. Demographic parity
    4. Equal opportunity
    5. Equalized odds

    Try also to plot the trade-offs in the zoomed in ROC-curve of exercise 5. above.

    Note: For the last three you will need to use the fairlearn package.