Nima Shahbazi, Jin Wang, Zhengjie Miao, Nikita Bhutani
Entity matching is a crucial task in many real applications. Despite the substantial body of research that focuses on improving the effectiveness of entity matching, enhancing its fairness has received scant attention. To fill this gap, this paper introduces a new problem of preparing fairness-aware datasets for entity matching. We formally outline the problem, drawing upon the principles of group fairness and statistical parity. We devise three highly efficient algorithms to accelerate the process of identifying an unbiased dataset from the vast search space. Our experiments on four real-world datasets show that our proposed algorithms can significantly improve fairness in the results while achieving comparable effectiveness to existing fairness-agnostic methods. Furthermore, we conduct case studies to demonstrate that our proposed techniques can be seamlessly integrated into end-to-end entity matching pipelines to support fairness requirements in real-world applications.