There is No Free Variable Importance: Statistical Issues in Explainable Machine Learning
Giles Hooker (Cornell University)
Tuesday 30th May, 2023 13:00-14:00 Maths 311B/Zoom
Abstract
The field of machine learning — loosely defined as nonparametric statistical modeling — has become enormously successful over the past fifty years, partly by forgoing the parametric models familiar to statisticians. A consequence of this philosophy has been that these methods result in algebraically-complex models that provide little human-accessible insight into the workings of the model, or what it might say about the underlying processes generating the data. As these methods have been taken up in high-stakes decision making, demands to “x-ray the black box” have become more prevalent, resulting in a wide variety of approaches to understand what signal the model is capturing, or to provide explanations of individual predictions.
Unfortunately, many of these methods produce results that can lead to mistaken conclusions about the model, or the underlying processes, or both. This talk reviews two sources of error: distorting the covariate distribution beyond the range where the model performs well, and estimating structured surrogates using insufficient data. We show that many popular interpretation/explanation methods suffer from these, potentially resulting in mistaken conclusions or advice, and review the properties necessary to generate reliable diagnostics.
Email wei.zhang.2@glasgow.ac.uk for Zoom link.
Add to your calendar
Download event information as iCalendar file (only this event)