Methods for interpreting and explaining AI models

Posted Mar 29, 2023

By Chris Lovejoy

1 min read

AI models are inherently difficult to interpret so methods have been developed to support interpretability and explainability. Interpretability can be defined as being able to understand what caused a decision or being able to predict/explain the result.

These methods can be broadly divided into:

inherent explainability: using ‘interpretable’ models where we can directly look at understand the parameters
post-hoc explainability, where we develop explanations after model trained.

Inherent explainability

Models which are inherently interpretable include:

linear regression - we can look at coefficients
decision trees (we can look at branch points)
generalised linear models - again, we can look at coefficients
naive bayes - we can look at the conditional probabilities
nearest neighbours - we can look at each of the nearest neighbours

Of note, these do not include deep learning models, which are coming to dominant the field of AI.

Post-hoc explainability

This includes:

surrogate methods: building a simpler, interpretable model which models the model. Can be done on a globally (across whole spectrum of inputs/outputs) or locally (such as ‘LIME’ - local interpretable model-agnostic explanations)
investigate the impact of features: look at how predictions change when a specific parameter is changes (partial dependence plots), look at the importance of different features (permutation feature importance)
visualise the features themselves (‘feature visualisation’), such as heatmaps/saliency maps for CNNs or language models

Limitations of current methods

Saliency maps are good for where but they don’t tell us what. Therefore, they require a step of human interpretation, which has a risk of bias.

In healthcare, medication is often effectively a black box. E.g. mechanism of action of paracetamol is not well-understood. RCTs have been used for providing that it works. The same could be applied to healthcare.

ai-explainability ai-interpretability

This post is licensed under CC BY 4.0 by the author.

Inherent explainability

Post-hoc explainability

Limitations of current methods

Related

Tags