Minun and Explainable Entity Matching

Given two collections of entities, such as product listings, the entity matching (EM) problem aims to identify all pairs that refer to the same object in the real world, such as products, publications, businesses, etc. Recently, deep learning (DL) techniques have been widely applied to the EM problem and have achieved promising results. Unfortunately, the performance gain brought by DL techniques comes at the cost of reducing transparency and interpretability. The reason is that DL-based approaches are more like black-box models with limited interpretability. In real applications, it is essential that users can interpret these models and understand why they make certain positive/negative predictions. Thus, it is important to develop a framework to explain the results of DL-based EM models.

The Minun Framework


In order to satisfy such requirements, we developed Minun, a model-agnostic framework to provide local explanations for black-box EM models. Minun employs counterfactual examples of entity pairs as the explanation. In a nutshell, counterfactual explanations describe the minimal modifications to change the model’s prediction. Given a pair of entities and the black-box model, we first construct the search space for candidates of counterfactual examples. This is realized by considering the set of operations that make the two entities more similar (dissimilar) for pairs with negative (positive) prediction from the black-box model. In this work, we utilize the token-level edit distance to reach this goal, where the candidates for counterfactual examples are generated by inserting/deleting/updating a token in one entity in the pair. To reduce the time to explore the search space, we developed two efficient algorithms, a greedy one and a binary search one. The details can be found in our paper.  

New Evaluation Method

Next issue to resolve is to develop a quantitative evaluation to measure the quality of generated explanation, especially when they are generated from different explaining methods. Motivated by recent advances from the NLP community, we employed a teacher-student paradigm to evaluate the quality of explanations as shown in the figure above. Specifically, we utilized the generated explanations to construct a training set for a new model differently from the black-box model to be explained. Then, the black-box model could be regarded as a “teacher” that teaches the “student” model using knowledge from the generated explanations. In this way, we enabled the comparison between different explanation methods by constructing different training sets with the explanations generated by different methods. We measured the effectiveness of the explanation method as follows: we first trained a model on the original training set for the student model without explanations. Next, we trained a second model with exactly the same hyper-parameters on the training set augmented by the explanations. Then, the delta F1 score of the two models on the test set for the student model could be regarded as the main evaluation metric: The higher F1 score the model trained with explanations has over the one without explanations, the better quality the explanations have.

Experiment Results

We conducted experiments on five popular datasets for EM: Amazon-Google (AG), DBLP-ACM (DA), DBLP-Scholar (DS), Walmart-Amazon (WA) and Abt-Buy (AB). We chose Ditto, which is based on pre-trained transformer models and far outpaces the state-of-the-art performance on the 5 datasets above, as the target model to be explained. We extended LIME and SHAP, which are popular explanation methods for black-box ML models, to support EM applications as the baseline methods. We trained a small DistilBERT model as the student model.

The results of delta F1 scores are shown in the table above. We can see that the student model can benefit greatly from the explanations generated by the approaches developed in our Minun framework. Specifically, the F1 scores can be improved by explanations generated by Minun on 4 out of 5 datasets. And the average improvement achieved by Minun is as high as 8.9.

If you are interested in learning about Minun, please check our paper. We also released the source code on Github. You can check this repo about it.

Written by: Jin Wang and Megagon Labs

Follow us on LinkedIn and Twitter to stay up to date with new research and projects.


More Blog Posts: