Now you see me!

Abstract

Neural networks are part of daily-life decision-making, including in high-stakes settings where understanding and transparency are key. Saliency maps have been developed to gain understanding into which input features neural networks use for a specific prediction. Although widely employed, these methods often result in overly general saliency maps that fail to identify the specific information that triggered the classification. In this work, we suggest a framework, called Var, that allows to incorporate attributions across classes to arrive at saliency maps that actually capture the class-relevant information. On established benchmarks for attribution methods, including the grid-pointing game and randomization-based sanity checks, we show that our framework heavily boosts the performance of standard saliency map approaches. It is, by design, agnostic to model architectures and attribution methods and now allows to identify the distinguishing and shared features used for a model prediction.

Var is a plug-and-play framework

We show Var augmenting different attribution methods (columns) and architectures (rows) for detecting zebras (left) resp. bisons (right) in the original image (middle), using Gradient Backpropagation (GPB), Integrated Gradients (IG), and GradCAM, for respectively ResNet50 and ViT. Var seemlesly integrates with various attribution methods and architectures.

... can be described three steps

Step 1: Initial Attribution

First, compute attribution maps for each class c ∈ {1, ..., K}:

$$A_c = \text{Attribution}(x, c)$$

Where: x is the input, K is the number of classes, A_c is the attribution map for class c

Step 2: Pixel-wise Softmax

Compute softmax across classes for each pixel position (i,j):

$$M_c(i, j) = \frac{e^{A_c(i,j)}}{\sum_{k=1}^{K} e^{A_k(i,j)}}$$

Step 3: Final Attribution

The final attribution for class c is computed as:

$$V_c = A_c \odot M_c \odot \mathbb{1}_{M_c - \frac{1}{K} > 5\times10^{-3}}$$

Where: ⊙ denotes element-wise multiplication, 𝟙 is the indicator function, K is the number of classes, 5×10^-3 is the threshold parameter

... and implemented in few lines of code

contrastive_attribution.py

def attribute(self, img, target, class_indices):
    """Compute contrastive attribution for target class."""
    
    # Compute attributions for all classes at once
    attributions = torch.stack([
        self.base_attribute_fn(img, target=class_idx).detach()
        for class_idx in class_indices
    ], dim=0)
    
    # Apply softmax to get importance weighting
    mask = torch.nn.functional.softmax(10.0 * attributions, dim=0)
    
    # Find target index
    target_idx = class_indices.index(target)
    
    # Compute final attribution with thresholding
    threshold = 1.0 / len(class_indices)
    indicator = (mask[target_idx] - threshold) > self.tau
    return attributions[target_idx] * mask[target_idx] * indicator

Var improves on the grid pointing game

We evaluate Var using the grid pointing game with a 2×2 grid of random ImageNet validation images (Quad-ImageNet), consisting of 12,500 evaluation images. Var substantially improves localization performance across all attribution methods. For ResNet50, we observe gains in Region Attribution (RA), which quantifies what portion of the total attribution weight falls within the target region, ranging from +0.16 to +0.54 with an average increase of +0.24. GradCAM with Var shows particularly strong performance, achieving an RA of 0.92, F1 score of 0.81, and IoU improvement from 0.41 to 0.71. Precision increases on average by +0.31. Our framework not only improves localization of distinguishing features but also recovers common features of closely related classes. GBP precision improved from 0.25 to 0.83, and Guided GradCAM from 0.39 to 0.84 when enhanced with Var .

Integrated Gradients

Guided Backprop

Input x Gradients

Var finds actionable class-specific features

Var attributions, in contrast to the baselines, capture the specific class-relevant features. Our ablation experiments demonstrate how Var can surgically modify images to change model predictions by removing only the most discriminative features. For the porcupine image, removing class-specific features changes the prediction to another class entirely. This demonstrates that Var precisely identifies the distinctive features that separate a porcupine from similar animals. With the cougar image, removing just a single distinctive feature - the ear - significantly increases the model's uncertainty, showing that Var correctly identifies this feature as crucial for the model's confident classification. These examples illustrate how Var attributions allow for surgical removal of information from the image, rather than destroying all content. By targeting only the discriminative features identified by Var, we can manipulate the model's output distribution, shwoing that these highly relevant features.

BibTeX

@article{walter2025var,
        title={Now you see me! A framework for obtaining class-relevant saliency maps},
        author={Walter, Nils Philipp and Vreeken, Jilles and Fischer, Jonas},
        journal={arXiv preprint arXiv:2503.07346},
        year={2025}
}