Methods for interpreting and explaining AI models
AI models are inherently difficult to interpret so methods have been developed to support interpretability and explainability. Interpretability can be defined as being able to understand what caused a decision or being able to predict/explain the result.
These methods can be broadly divided into:
- inherent explainability: using ‘interpretable’ models where we can directly look at understand the parameters
- post-hoc explainability, where we develop explanations after model trained.
Inherent explainability
Models which are inherently interpretable include:
- linear regression - we can look at coefficients
- decision trees (we can look at branch points
- generalised linear models - again, we can look at coefficients
- naive bayes - we can look at the conditional probabilities
- nearest neighbours - we can look at each of the nearest neighbours
Of note, these do not include deep learning models, which are coming to dominant the field of AI.
Post-hoc explainability
This includes:
- surrogate methods: building a simpler, interpretable model which models the model. Can be done on a globally (across whole spectrum of inputs/outputs) or locally (such as ‘LIME’ - local interpretable model-agnostic explanations)
- investigate the impact of features: look at how predictions change when a specific parameter is changes (partial dependence plots), look at the importance of different features (permutation feature importance)
- visualise the features themselves (‘feature visualisation’), such as heatmaps/saliency maps for CNNs or language models
Limitations of current methods
Saliency maps are good for where but they don’t tell us what. Therefore, they require a step of human interpretation, which has a risk of bias.
In healthcare, medication is often effectively a black box. E.g. mechanism of action of paracetamol is not well-understood. RCTs have been used for providing that it works. The same could be applied to healthcare.
Related
- The false hope of current approaches to explainable artificial intelligence in health care - critique of interpretability methods, with a focus on healthcare.
- Producing simple text descriptions for AI interpretability - Luke Oakden-rayner
- Demystifying black-box models with symbolic metamodels - van der Schaar
- Semi-interpretable Probabilistic Models by Brooks Paige
- Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead - Rudin
Comments powered by Disqus.