Author: Natalie Nuno

The First Workshop on Matching: Introduction, Scope, and Highlights

In this workshop, we are interested in (but not restricted to) the dimensions of matching (see Figure 1), as well as their resultant combinations. Recent advances in diverse communities from artificial intelligence and databases to computational linguistics and human-computer interaction, have demonstrated promising results in different matching tasks related to the previously mentioned (and many other) domains.

Read More »

Highlights of 2022 at Megagon Labs

2022 was a very productive year for our research team to bring forth many ideas to fruition as published papers in top conferences and workshops in the fields of natural language processing, machine learning, data Management, and human-computer interaction.

Read More »

Characterizing Human-Centered Information Extraction

In particular, we proposed feature- and system-specific guidelines for designing human-centered data systems. The feature-specific guidelines, inspired by cognitive engineering principles for enhancing human-computer performance, recommend automating the unwanted workload of humans.

Read More »


Sudowoodo can also improve the efficiency of model engineering since the learned representation can be applied to all stages of a typical entity matching pipeline, such as blocking, labeling, and matching. Besides, Sudowoodo can also support a variety of use-cases, such as data cleaning and semantic type detection, suggesting its versatility.

Read More »

COCOSum: Contrastive Summary for Two Comparable Entities

We developed a novel decoding algorithm, co-decoding. For the distinctive opinion summary generation, it emphasizes the distinctive words by contrasting the token probability distribution of the target entity against that of the counterpart entity. For the common opinion summary generation, it highlights the entity-pair specific words by aggregating token probability distributions.

Read More »


Entity Matching (EM) refers to the problem of finding pairs of entity records that refer to the same real-world entity such as customers, products, businesses, or publications. As one of the most fundamental problems in data integration, EM has a wide range of applications including data cleaning, data integration, knowledge base construction, and entity similarity search. We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models (LMs) such as BERT.

Read More »