What are the methods to interpret the output of machine learning methods?¶
The methods of interpretability of machine learning methods can be classified into the following groups:
Post-hoc Interpretability
Post-hoc interpretability refers to the interpretability of models after they have been successfully trained. Here we include a list approaches by their overarching logic.
Feature Analysis
Accumulated Local Effects (ALE) aims to explain the impact of features on the outcome in average.
Feature Interaction aims to explain how the interaction of features impact the model output.
Permutation Feature Importance (PFI) evaluates the impact of feature on the output via random permutation of particular feature input.
Model Inspection
Scoped Rules (Anchors) aims to arrive at anchoring rules unaffected by changes in other features.
Saliency Analysis
Partial Dependence Plot (PDP) visualizes the average dependence between the target and a set of features.
Individual Condition Expectation (ICE) visualizes the sample based dependence between the target and a set of features.
Shapley Additive Explanations (SHAP) aims to distribute the total outcome as individual feature contributions.
Model Simplification and Surrogate Models
Global Surrogate is the approximation of the model globally via a simpler interpretable model.
Local Surrogate(LIME) is the approximation of the model locally via a simpler interpretable model to explain an individual prediction.
Mathematical Modeling
These approaches building on some assumptions, formulate a mathematical framework to explain model outcomes.
Explanation by Example
Illustrate the workings of the model by studying representative samples and their corresponding output.
Explanation Generation
These methods aim to generate symbols or words explaining the inner working of the models.
Intrinsic Interpratability
Intrinsic interpretability refers to the interpretability that is built into the model. Two major approaches have been employed in intrinsic interpretability.
Interpretable Representation
A set of favorable properties such as monotonicity or sparsity are imposed on the model through regularization to arrive at more interpretable representations.
Interpretable Architecture
The architecture of the model is designed to increase its interpretability.
References