Towards a Unified Framework for Evaluating Explanations

Article Status

Published

Authors/contributors

Pinto, Juan D. (Author)
Paquette, Luc (Author)

Title

Towards a Unified Framework for Evaluating Explanations

Abstract

The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework of evaluation criteria and lay the groundwork for such a framework by articulating the relationships between existing criteria. We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models analyzed via post-hoc techniques. We further argue that useful explanations require both faithfulness and intelligibility. Explanation plausibility is a prerequisite for intelligibility, while stability is a prerequisite for explanation faithfulness. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.

Repository

arXiv

Archive ID

arXiv:2405.14016

Date

2024-05-22

URL

http://arxiv.org/abs/2405.14016

Accessed

12/06/2024, 18:48

Library Catalogue

arXiv.org

Extra

arXiv:2405.14016 [cs]

Citation

Pinto, J. D., & Paquette, L. (2024). Towards a Unified Framework for Evaluating Explanations (arXiv:2405.14016). arXiv. http://arxiv.org/abs/2405.14016

Technical methods

model evaluation subgroup

Link to this record

https://aievidencehub.org/lib/FE6ZVRN3