Blog

Sudowoodo: Contrastive Self-supervised Learning for Data Integration Applications

We introduce Sudowoodo, an end-to-end framework for a variety of data integration applications to resolve the limitations of data integration. Sudowoodo addresses the label requirement by leveraging contrastive learning to learn a data representation model from a large collection of unlabeled data items. This is realized by the contrastive objective that allows the model to learn how to distinguish pairs of similar data items from dissimilar ones that are likely to be distinct.

Read More »

The First Workshop on Matching: Introduction, Scope, and Highlights

In this workshop, we are interested in (but not restricted to) the dimensions of matching (see Figure 1), as well as their resultant combinations. Recent advances in diverse communities from artificial intelligence and databases to computational linguistics and human-computer interaction, have demonstrated promising results in different matching tasks related to the previously mentioned (and many other) domains.

Read More »

Highlights of 2022 at Megagon Labs

2022 was a very productive year for our research team to bring forth many ideas to fruition as published papers in top conferences and workshops in the fields of natural language processing, machine learning, data Management, and human-computer interaction.

Read More »

Summarizing Community-based Question-Answer Pairs

Megagon Labs researchers proposed a new CQA Summarization task focused on summarizing QA pairs in Community-based Question Answering. In addition, we developed a multi-stage annotation framework and created a benchmark CoQASum for the CQA Summarization task.

Read More »

Hybrid Active Learning for Low-Resource LM Fine-tuning

We identified two key designs that can improve the effectiveness and efficiency of sample acquisition: random sampling reduces the unlabeled pool being considered for acquisition, and decouples the diversity and uncertainty objectives in hybrid acquisition. Based on an investigation of existing methods, we propose a novel active learning method: TYROGUE.

Read More »