On Hedden's proof that machine learning fairness metrics are flawed

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Brian Hedden, in a recent article in Philosophy and Public Affairs [Hedden Citation2021. “On Statistical Criteria of Algorithmic Fairness.” Philosophy and Public Affairs 49 (2): 209–231. https://doi.org/10.1111/papa.v49.2.], presented a thought experiment designed to probe the validity of the fairness metrics used in machine learning (ML). The thought experiment has caused a great stir, also within machine learning [Viganó et al. “People are Not Coins: Morally Distinct Types of Predictions Necessitate Different Fairness Constraints.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, 2293–2301, New York, NY: Association for Computing Machinery.]. Brian Hedden describes a particular prediction problem p – involving 40 people divided into two rooms flipping biased coins – and a binary classification model m for predicting the outcome of these 40 coin flips. Brian Hedden argues that in the thought experiment, m is ‘perfectly fair’, but at the same time, he shows that almost all existing fairness metrics would score m as unfair. He concludes that almost all existing fairness metrics are flawed. If he is right, this seriously undermines most recent work on fair ML. We present three counter-arguments to Brian Hedden's thought experiment, of which the first is the most important: (a) the prediction problem p is irrelevant for ML because p is not (evaluated as) a learning problem, (b) the model m is not actually fair and (c) the prediction problem p is irrelevant for fairness metrics, because group assignment in p is random.
OriginalsprogEngelsk
TidsskriftInquiry
Antal sider20
ISSN0020-174X
DOI
StatusE-pub ahead of print - 2024

ID: 382094235