Highlights of 2022 at Megagon Labs

Looking back at 2022, we feel very happy with the growth and all the achievements our Lab could accomplish. In reflection of what we have accomplished we would like to highlight some of our major accomplishments and milestones.

Conference Accepted Publications

2022 was a very productive year for our research team to bring forth many ideas to fruition as published papers in top conferences and workshops in the fields of natural language processing, machine learning, data Management, and human-computer interaction. We take pride in sponsoring many conferences and have fun speaking to all the participants at our booth. It was great to see many of these conferences happening in person and having our researchers showcase their work first-hand at conferences and workshops. Below is a list of papers from our team:

AIBSD at AAAI – Extracting Salient Facts from Company Reviews with Scarce Labels by Jinfeng Li, Nikita Bhutani, Yoshi Suhara, Alex Whedon

CHI – Characterizing Practices, Limitations, and Opportunities Related to Text Information Extraction Workflows: A Human-in-the-loop Perspective by Sajjadur Rahman, Eser Kandogan

Findings of ACL – Comparative Opinion Summarization via Collaborative Decoding by Hayate Iso, Xiaolan Wang, Stefanos Angelidis, Yoshihiko Suhara

SIGMOD – Annotating Columns with Pre-trained Language Models by Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Cagatay Demiralp, Chen Chen, Wang-Chiew Tan

AIDM at SIGMOD –  Machop: an End-to-End Generalized Entity Matching Framework by Jin Wang, Yuliang Li, Wataru Hirota, Eser Kandogan

DEEM at SIGMOD – Minun: Evaluating Counterfactual Explanations for Entity Matching by Jin Wang, Yuliang Li

LREC – Self-Contained Utterance Description Corpus for Japanese Dialog by Yuta Hayashibe

Findings of NAACL – Low-resource Entity Set Expansion: A Comprehensive Study on User-generated Text by Yutong Shao, Nikita Bhutani, Sajjadur Rahman, Estevam Hruschka

ACM SIGIR – Beyond Opinion Mining: Summarizing Opinions of Customer Review by Reinald Kim Amplayo, Arthur Bražinskas, Yoshihiko Suhara, Xiaolan Wang, Bing Liu

COLING – Can Edge Probing Tasks Reveal Linguistic Knowledge in QA Models? by Sagnik Ray Choudhury, Nikita Bhutani, Isabelle Augenstein

EMNLP – Summarizing Community-based Question-Answer Pairs by Ting-Yao Hsu, Yoshi Suhara, Xiaolan Wang

Findings of EMNLP –Low-resource Interactive Active Labeling for Fine-tuning Language Models by Seiji Maekawa, Dan Zhang, Hannah Kim, Sajjadur Rahman, and Estevam Hruschka

DaSH Workshop at EMNLP – MegAnno: Exploratory Labeling for NLP in Computational Notebooks by Dan Zhang, Hannah Kim, Rafael Li Chen, Eser Kandogan, Estevam Hruschka

2nd WIT Workshop @ ACL 2022

In 2022 we co-organized the 2nd workshop on deriving insights from user-generated text (WIT). The goal of this workshop is to advance research harnessing user-generated text. Recent progress in natural language processing, machine learning, knowledge bases and database management have demonstrated promising results and far-reaching uses of text. However, there is tremendous untapped potential in exploring and exploiting advanced AI/ML/NLP techniques on user-generated text, which is rich in user insights and experiences. With this workshop we brought together researchers and practitioners in this area, to clarify impactful research problems, share findings from adaptation of existing approaches to user-generated data, and generate new ideas for future research. 

We’d like to thank all of the organizers, participants, and guest speakers. We look forward to our MATCHING Workshop, which has already been accepted at ACL 2023.

Engineering Articles

As our engineering team grew in 2022 we increasingly put out more engineering-oriented blogs that exhibited work from our team. Below are some articles from our engineering team. Here are some articles by our engineering team. We hope to bring you more in 2023. 

React + D3: A Starter’s Guide by Natalie Nuno and Eser Kandogan

Extending the Jupyter UX with Custom Widgets: Lessons Learned by Rafael Li Chen

Paraphrase Generation for Long Text by Austin King and Eser Kandogan


One of our favorite things to do is welcome interns to our office. Internship projects allow us to work with talented students from all over the world. Often these projects result in publications at some of the most prestigious scientific conferences. Here are some blog articles about our internship experiences: 

We are also proud to list some publications to which our interns contributed this year: 

aiDM at SIGMOD 2022 – Machop: an End-to-End Generalized Entity Matching Framework by Jin Wang, Yuliang Li, Wataru Hirota, Eser Kandogan

Findings of NAACL 2022 – Low-resource Entity Set Expansion: A Comprehensive Study on User-generated Text by Yutong Shao, Nikita Bhutani, Sajjadur Rahman, Estevam Hruschka

EMNLP 2022 – Summarizing Community-based Question-Answer Pairs by Ting-Yao Hsu, Yoshi Suhara, Xiaolan Wang

Findings of EMNLP –Low-resource Interactive Active Labeling for Fine-tuning Language Models by Seiji Maekawa, Dan Zhang, Hannah Kim, Sajjadur Rahman, and Estevam Hruschka

While we offer internships throughout the year,  Summer is the most popular time for internship applicants. We have already accepted interns for the Summer 2023 program but we have room for more! If you or anyone you know is interested in interning with us, go ahead and submit your application here.

Advisory Board

We have a great advisory board comprised of Tom Mitchell, Mirella Lapata, and, as of May of 2022, Renee Miller graciously accepted the request to join our advisory board. In 2022, Tom Mitchell also accepted to serve as a research fellow on our team, becoming even more involved in our work.

Looking Ahead

As we start the new year we continue our effort to push the state of the art technologies in the areas of human centered AI, AI for data management, knowledge representation and reasoning, and natural language processing (NLP). In doing so we support venues that promote the discussion and sharing of new research. This year we are again sponsoring conferences. We are especially excited to bring our first Workshop on Matching From Unstructured and Structured Data to ACL 2023. We hope to see you at ACL and many other conferences in 2023! 

