Measuring Beyond Accuracy

March 21, 2022 • Conference Paper

By

Violet Turri, Rachel Dzombak, Eric Heim, Nathan M. VanHoudnos, Jay Palat, and Anusha Sinha

This paper was presented at the 2022 AAAI Spring Symposium on AI Engineering.

Publisher

Software Engineering Institute

Abstract

Most machine learning (ML) projects focus on “accuracy” for model evaluation. While accuracy is useful for knowing how well a model performs on a test dataset at the time of model development, there are other significant implications in assessing the utility and usability of a machine learning model. Key considerations include robustness, resilience, calibration, confidence, alignment with evolving user requirements, and fit for mission and stakeholder needs as part of an integrated system, among others. In this paper, we explore what it means to measure beyond accuracy and define critical considerations for the test and evaluation of machine learning and, more broadly, artificial intelligence (AI) systems. After defining key measurement considerations, the AI engineering community will be better equipped to develop and implement comprehensive and applied methods for the evaluation of models as well as possible metrics for more realistic and real-world model evaluation.

Software Engineering Institute