What are the methods to interpret the output of machine learning methods?¶
The methods of interpretability of machine learning methods can be classified into the following groups:
Post-hoc interpretability refers to the interpretability of models after they have been successfully trained. Here we include a list approaches by their overarching logic.
Accumulated Local Effects (ALE) aims to explain the impact of features on the outcome in average.
Feature Interaction aims to explain how the interaction of features impact the model output.
Permutation Feature Importance (PFI) evaluates the impact of feature on the output via random permutation of particular feature input.
Scoped Rules (Anchors) aims to arrive at anchoring rules unaffected by changes in other features.
Partial Dependence Plot (PDP) visualizes the average dependence between the target and a set of features.
Individual Condition Expectation (ICE) visualizes the sample based dependence between the target and a set of features.
Shapley Additive Explanations (SHAP) aims to distribute the total outcome as individual feature contributions.
Model Simplification and Surrogate Models
Global Surrogate is the approximation of the model globally via a simpler interpretable model.
Local Surrogate(LIME) is the approximation of the model locally via a simpler interpretable model to explain an individual prediction.
These approaches building on some assumptions, formulate a mathematical framework to explain model outcomes.
Explanation by Example
Illustrate the workings of the model by studying representative samples and their corresponding output.
These methods aim to generate symbols or words explaining the inner working of the models.
Intrinsic interpretability refers to the interpretability that is built into the model. Two major approaches have been employed in intrinsic interpretability.
A set of favorable properties such as monotonicity or sparsity are imposed on the model through regularization to arrive at more interpretable representations.
The architecture of the model is designed to increase its interpretability.
Gilpin, Leilani H., et al. “Explaining explanations: An overview of interpretability of machine learning.” 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA). IEEE, 2018.
Chakraborty, Supriyo, et al. “Interpretability of deep learning models: a survey of results.” 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 2017.