Sudowoodo

Entity matching refers to the problem of finding pairs of entity records that refer to the same real-world entity such as customers, products, businesses, or publications. Sudowoodo is a contrastive representation learning-based framework for end-to-end entity matching. Contrastive learning enables Sudowoodo to learn similarity-aware data representations from a large corpus of data items, e.g., entity entries, without using any labels. The learned representations can later be either directly used or facilitate fine-tuning with only a few labels thus drastically reducing the number of required labels. Sudowoodo can also improve the efficiency of model engineering since the learned representation can be applied to all stages of a typical entity matching pipeline, such as blocking, labeling, and matching. Besides, Sudowoodo can also support a variety of use cases, such as data cleaning and semantic type detection, suggesting its versatility.