search menu icon-carat-right cmu-wordmark

Knowing When You Don’t Know: AI Engineering in an Uncertain World

Presentation
Presents a method to improve AI system robustness that evaluates machine-learned classifiers and derives metrics that directly measure calibration performance.
Publisher

Software Engineering Institute

Watch

Abstract

When engineering AI systems, ensuring robustness is often challenging due to the inherent uncertainty in the environments in which AI systems are deployed. Motivated by this, this project focuses on quantifying, characterizing, and mitigating uncertainty in machine learning models, which are often at the center of AI systems. This presentation describes work on a particular formalization of quantifying uncertainty known as classifier calibration. More specifically, it focuses on how to evaluate machine-learned classifiers for their ability to express confidence in their predictions according to definitions of calibration that map to application-specific contexts. Empirical results from this work show that standard metrics for evaluating classifier calibration often do not measure the quality of confidence estimates with respect to important events, trade-offs, or use cases that are important to practical usage of the classifier. Motivated by this observation, metrics are derived that directly measure calibration performance in such contexts and compared a variety of state-of-the-art classifier calibration according to them. Equipped with these metrics, machine learning practitioners can evaluate their classifiers in a way that more closely reflect the setting in which they are used. As a result, they can understand how robust their models are within mission context.