Measurably Stronger Explanation Reliability via Model Canonization

The work "Measurably Stronger Explanation Reliability via Model Canonization", supported by iToBoS project, has been published.


While rule-based attribution methods have proven useful for providing local explanations for Deep Neural Networks, explaining modern and more varied network architectures yields new challenges in generating trustworthy explanations, since the established rule sets might not be sufficient or applicable to novel network structures. As an elegant solution to the above issue, network canonization has recently been introduced. This procedure leverages the implementation-dependency of rule-based attributions and restructures a model into a functionally identical equivalent of alternative design to which established attribution rules can be applied. However, the idea of canonization and its usefulness have so far only been explored qualitatively. In this work, we quantitatively verify the beneficial effects of network canonization to rule-based attributions on VGG-16 and ResNet18 models with BatchNorm layers and thus extend the current best practices for obtaining reliable neural network explanations.


This work was supported by the European Union’s Horizon 2020 research and innovation programme (EU Horizon 2020) as grant [iToBoS (965221)].

Find out more at