Improving Explanations with Model Canonization

Backpropagation and rule-based XAI methods are prominent choices to explain neural network predictions. This is due to their speed and efficiency, as the computation of explanations only requires one backward pass through the model.

Another important factor for the popularity of backpropagation and rule-based XAI methods is the high quality and faithfulness of their explanations. However, these methods may struggle when being applied to modern model architectures with innovative building blocks or high inter-connectivity.

This is caused by types of neural network layers which have been shown to break implementation invariance, which has been defined as axiom for XAI methods. Specifically, implementation invariance is desirable from a functional perspective, and suggests that explanations computed for two different networks implementing the same mathematical function should always be identical. However, for example in the presence of BatchNorm (BN) layers, implementation invariance is hurt.