Why Does My Model Fail? Contrastive Local Explanations for Retail Forecasting

Author: Ana Lucic.

As an increasing number of decisions about humans are made by machine learning algorithms, it becomes important to understand how these systems work and and what drives a model to a particular prediction. Since machine learning models are automatically learned from data and often involve complex architectures with many parameters, it is not always clear exactly which inputs the model considers to be important for a given prediction. This is especially problematic when the predictions are incorrect, as these mistakes can significantly impact users’ perceptions of the model.

This is why we propose Monte Carlo Bounds for Reasonable Predictions (MC-BRP): a method for automatically generating explanations about erroneous predictions. 

EXplainable AI (XAI) methods generate explanations that are typically given in either a global or local manner, where global explanations are meant to give insight into the model as a whole, while local explanations are specific to individual predictions. In this work, we focus on local explanations, specifically explanations for regression predictions that result in large errors. 

The MC-BRP explainer

In our setting, the goal is to predict sales of  supermarkets in the Netherlands based on financial, workforce and physical store features. We define a large error based on Tukey’s fences — a widely used definition to identify statistical outliers, but this definition can be adjusted depending on the users’ requirements. Given a large error, MC-BRP generates a set of perturbed versions of the original instance that result in reasonable predictions (i.e., predictions that are not large errors). This is done by performing Monte Carlo simulations on each of the features deemed most important for the original prediction. For each of these features, we determine the bounds needed for a reasonable prediction based on the mean and standard deviation of this new set of reasonable predictions. We also determine the relationship between each feature and the target using the Pearson correlation, and present these to the user as the explanation. Below is an example of an MC-BRP explanation for a prediction resulting in a large error. We see that the feature values are all outside of the ‘Reasonable Range’ generated by MC-BRP. We also see that all five features are positively correlated with the target variable.

Example of A MC-BRP explanation for a prediction resulting in a large error.

Evaluation on Practitioners and Researchers

A central goal of many XAI methods, including MC-BRP, is to explain complex models in terms that are understandable to humans. This is why we evaluate MC-BRP on a real dataset with 75 real users from the University of Amsterdam (Researchers) and Ahold Delhaize (Practitioners). We find that users are able to answer objective questions about the model’s predictions with overall 81.1% accuracy. To understand the impact MC-BRP explanations have on users’ attitudes towards the model, we conduct a between-subject experiment using the following subjective questions:

  • SQ1: I understand why the model makes large errors in predictions.
  • SQ2: I would support using this model as a forecasting tool.
  • SQ3: I trust this model.
  • SQ4: In my opinion, this model produces mostly reasonable outputs.

We find that explanations generated by MC-BRP help users understand why models make large errors in predictions (SQ1), but do not have a significant impact on support in deploying the model (SQ2) , trust in the model (SQ3), or perceptions of the model’s performance (SQ4). In the graphs below, participants in the Treatment group are shown MC-BRP explanations while participants in the Control group are not given any explanation.

Results from a within-subject study comparing an- swers between the Treatment (MC-BRP explanation) and Control (no explanation) groups.

We also compare the distributions of answers to the four subjective questions between Practitioners and Researchers based on users in the treatment group (i.e., those who saw MC-BRP explanations). We find that a similar proportion of Researchers and Practitioners believe they understand why the model makes large errors in predictions (SQ1), but the results for the other three questions are quite different. Researchers are more in favor of supporting deployment of the model (SQ2), are more likely to trust the model (SQ3) and more believe it has reasonable predictions (SQ4) in comparison to Practitioners. This suggests that our user study population is fairly heterogeneous and that users from different background have different criteria for deploying or trusting a model, and varying levels of confidence regarding the accuracy of its outcomes.

Results from a l-subject study comparing answers between participants who are Practitioners or Re- searchers (in the treatment group).

MC-BRP has been introduced in Why Does My Model Fail? Contrastive Local Explanations for Retail Forecasting, which won the Best Student Paper (CS) award at the 2020 ACM Conference on Fairness, Accountability, and Transparency.

Ana Lucic is a PhD student in ILPS. Her doctorate research is focused on interpretability in machine learning.