DEEM – SIGMOD
2022
Jin Wang, Yuliang Li
Entity Matching (EM) is an important problem in data integration and cleaning. More recently, deep learning techniques, especially pre-trained language models, have been integrated into EM applica- tions and achieved promising results. Unfortunately, the significant performance gain comes with the loss of explainability and trans- parency, deterring EM from the requirement of responsible data management. To address this issue, recent studies extended ex- plainable AI techniques to explain black-box EM models. However, these solutions have the major drawbacks that (i) their explana- tions do not capture the unique semantics characteristics of the EM problem; and (ii) they fail to provide an objective method to quantitatively evaluate the provided explanations. In this paper, we propose Minun, a model-agnostic method to generate expla- nations for EM solutions. We utilize counterfactual examples gen- erated from an EM customized search space as the explanations and develop two search algorithms to efficiently find such results. We also come up with a novel evaluation framework based on a student-teacher paradigm. The framework enables the evaluation of explanations of diverse formats by capturing the performance gain of a “student” model at simulating the target “teacher” model when explanations are given as side input. We conduct an extensive set of experiments on explaining state-of-the-art deep EM models on popular EM benchmark datasets. The results demonstrate that Minun significantly outperforms popular explainable AI methods such as LIME and SHAP on both explanation quality and scalability.