In this blog post, we present MEGAnno, our flexible, exploratory, efficient, and seamless labeling framework for NLP researchers and practitioners. In short, MEGAnno aims to reduce costs while improving the quality of labeling.
2022 was a very productive year for our research team to bring forth many ideas to fruition as published papers in top conferences and workshops in the fields of natural language processing, machine learning, data Management, and human-computer interaction.
Megagon Labs researchers proposed a new CQA Summarization task focused on summarizing QA pairs in Community-based Question Answering. In addition, we developed a multi-stage annotation framework and created a benchmark CoQASum for the CQA Summarization task.
We identified two key designs that can improve the effectiveness and efficiency of sample acquisition: random sampling reduces the unlabeled pool being considered for acquisition, and decouples the diversity and uncertainty objectives in hybrid acquisition. Based on an investigation of existing methods, we propose a novel active learning method: TYROGUE.
In this blog post, we will define the problem of paraphrasing. We will explain the challenges of document-level paraphrasing, especially in the business domain. These challenges include evaluation. Following this, we will briefly describe the results of the survey study, and identify key ideas.