Research directions presented at EMNLP 2025 span agentic systems, retrieval, interpretability, multimodality, training, and human–AI interaction, including work contributed by Megagon Labs.
Explore the key takeaways from COLM 2025, including breakthroughs in Reasoning & RL, Multimodal LLMs, and Retrieval & Embedding, as highlighted by Megagon Labs research scientists and engineer.
We share Megagon Labs’ key takeaways from ACL 2025 — highlighting the trends, debates, and breakthroughs shaping the future of NLP, agentic AI, and trustworthy evaluation.
Stream processing is a key ingredient when making “agentic workflows” enterprise-ready. Streams support a wide range of workflows and support complexity while at the same time bringing about the right abstractions and scope for facilitating accuracy, scalability, and ease of use.
𝙠𝙉𝙉𝘽𝙀 is a hybrid model combines the efficiency of bi-encoders with the precision of k-NN lookups for skill mapping. With it, organizations can proficiently associate job descriptions with appropriate skills. By employing labeled synthetic sentences, 𝙠𝙉𝙉𝘽𝙀 enhances accuracy while maintaining speed, ideal for large-scale projects.
How can enterprise systems evolve to support agentic workflows? In this post, we explore the conceptual foundations of Blue—a framework designed to integrate AI agents, data, and services into scalable, observable, and controllable enterprise applications.
We present three new papers that tackle a pressing and underexplored topic in NLP: multi-document reasoning. These works offer rigorous benchmarks, novel methodologies, and empirical insights into how large language models (LLMs) handle complexity across multiple sources of information.
We present Blue v0.9, our open-source framework for building and deploying agentic workflows in enterprise environments. Unlike conventional AI frameworks, Blue is designed with enterprise-scale requirements in mind—scalability, observability, configurability, and seamless integration with existing infrastructure.
With the MCRank benchmark and our EXSIR method, we’ve shown that LLMs can significantly improve their performance on these challenging tasks when guided by structured reasoning.
We echo through this blog that the optimization framework for compound AI systems should achieve broader goals such as multi-objective (accuracy, cost, latency, etc.), multi-plan optimization and also handling constraints, especially the budget. Again, these optimization goals are not comprehensive by far but are important for enterprise scenarios.
AmbigNLG tackles ambiguity in Natural Language Generation (NLG) instructions by identifying unclear specifications and refining them for better output quality.
Through this inside peek at our internship program, explore the types of projects we at Megagon Labs formulate for our interns. If you are looking to start an internship soon, take their advice and apply it to your own internships.
MEGAnno combines the power of large language models (LLMs) with human expertise to streamline and enhance the data labeling process with a data annotation framework. Throughout this article, we’ll showcase MEGAnno’s capabilities as we provide detailed code snippets.
Drawing from our experience at the NAACL conference, the Megagon Labs team has crafted this blog post to highlight three major trends: targeted evaluation, reasoning, and fine-tuning/RAG. These trends represent significant advancements in the field of NLP and showcase the innovative approaches researchers are taking to enhance the capabilities of LLMs.
Long-form text matching is a critical problem to solve in the field of Natural Language Processing (NLP) and Information Retrieval (IR). We propose a simple yet effective solution using sequence pair classification with Transformer models, demonstrating its superiority over state-of-the-art Siamese network-based methods.
Explore the relationship between option arrangement and performance variations in Large Language Models (LLMs) during multiple-choice tasks. Through meticulous analysis, we uncovered substantial sensitivity of LLMs to the order of answer options, with performance fluctuations of up to 75% across different benchmarks.
Researchers at Megagon Labs have been exploring how we can address the challenges of building compound AI systems for enterprises. In this blog post, we introduce three projects that we have undertaken: (1) developing a suitable architecture for productizing compound AI systems, (2) optimizing agentic workflows with real-world constraints, and (3) benchmarking the performance of agents within a compound AI system, specifically in an enterprise setting.
The article presents the WiTQA dataset, designed to assess the impact of retrieval on the performance of language models in question-answering systems. It details the findings on when retrieval augmentation enhances QA accuracy and when it may introduce errors, providing valuable guidance for optimizing RALMs.
By enabling robust and accurate column annotation, this innovative framework holds the potential to revolutionize data-driven decision-making processes across a multitude of industries. Watchog could empower businesses to extract valuable insights from product catalogs, pricing tables, and customer data repositories and use it to optimize their pricing strategies, and deliver personalized recommendations to enhance customer satisfaction and loyalty.
This article offers a glimpse into the dynamic research environment at Megagon Labs, where researchers are pioneering advancements in Natural Language Processing (NLP).
To push the boundaries of text editing with LLMs, we introduce XATU—a new text editing benchmark that incorporates fine-grained instructions and gold-standard edit explanations for explainable text updates.
Instead of completely replacing human annotators with LLMs, we need to leverage the strengths of both sides to obtain accurate and reliable annotations. This article will discuss how to effectively utilize LLMs as collaborators for data annotation.
We introduce our human-LLM collaborative annotation tool, MEGAnno+, addressing the challenges in LLM annotation by integrating human expertise with LLM capabilities.
We discuss how to leverage LLMs as data annotation agents and the practical challenges that may arise. We briefly introduce our LLM annotation tool, MEGAnno+.
In the realm of text generation and summarization, the evaluation of generated summaries, especially for long documents, has always been a challenging task. To address these challenges, we evaluate long models using an innovative approach that significantly reduces evaluation costs and aligns more closely with human evaluations.
Aiden is a Research Engineer at Megagon Labs. He is passionate about developing, deploying, and maintaining machine learning models in production with high availability and scalability.
This blog post will peel back the layers of our KG building and learning platform, illuminating its role in enriching machine learning. As we explore our distinctive pipelines and delve into the granularities of data provenance and GNN training, we’ll showcase how our system facilitates the seamless integration of KGs into practical, real-world tasks for production use cases.
We’d like to reflect on our collaborations and accomplishments as we wrap up the year. We put a lot of effort into our research papers and blog articles, and we’d like to thank you, the readers, for engaging with us. We also thank our guest speakers who have come into our office, virtually or in person, to share their work and dedication with us
We introduced new metrics to measure factual knowledge in LLMs, addressing the limitations of existing ranking-based methods. Our metrics outperformed traditional ranking-based approaches, providing more accurate assessments of LLMs’ factual knowledge. We also explored the difference between implicit and explicit knowledge instillation in LLMs, highlighting that explicit knowledge instillation alone is insufficient in cases related to location and language queries.
We measured and analyzed various KG properties and described common/distinct structural patterns we observed in the datasets. Based on our findings, we formulated several recommendations for practitioners for future KG model development, evaluation, and dataset construction.
We invite scholars, researchers, and practitioners to contribute their expertise and insights to the inaugural NLP4HR workshop. We are excitedly anticipating an eclectic mix of submissions covering a broad range of HR-related topics.
Our experiments show ZETT advances state-of-the-art extraction accuracy while providing a conceptually simple and stable solution. Going forward, we believe methods like ZETT that leverage self-supervised pre-training will play a key role in adapting information extraction to open-domain settings.
Get to know Chen Shen, Senior Research Engineer at Megagon Labs as he uncovers up and coming engineering and research methods in AI and the dynamics of the Megagon office.
This article is meant to provide an overview of ACL 2023 with a focus on papers highlighting recent exciting breakthroughs such as large language models (LLMs). We briefly capture the papers that stood out to our attending research scientists and research engineers.
A recap of ACL2023! In this post, we briefly capture the themes of the keynote talks, summarize panel discussions focusing on NLP in the era of LLMs. We also share our experience and insights from the MATCHING workshop.
In this work, we propose an end-to-end framework named Starmie. Dataset discovery from data lakes is a critical way to utilize open-domain data within the enterprise. To overcome the issues stemming from data quality and incomplete metadata in data lakes, it is essential to support the problem of table union search, which aims to find all tables that are unionable with the query table, given a query table and a collection of data lake tables.
The ACM SIGMOD conference is the leading forum for the principles, techniques, and applications of database management systems and data management technology. There are 26 sponsors for SIGMOD this year, and Megagon Labs was a Silver sponsor. The conference consisted of the research track, the industry track, the demonstration track, 11 tutorials, and 10 workshops.
At Megagon Labs, we are working on symbiotic models and systems (Figure 1) that take advantage of LLMs as well as structured (knowledge bases [KBs], knowledge graphs [KGs], databases [DBs], etc.) and unstructured (texts) information in a continuous and (semi-) automated machine-learning paradigm. In this post we will describe Megagon KnowledgeHub and how our research and development benefits from it.
We shine a spotlight on three cutting-edge AI projects that have been making waves in the industry: ZETT, CoCoSum, and ESE. These groundbreaking initiatives offer a glimpse into the future of AI and the transformative impact it holds across various domains.
We’d like to introduce you to Vishwas Mruthyunjaya, Senior Data Scientist at Megagon Labs. We’ll discuss his growth at Megagon, the advice he’d give to aspiring data scientists and engineers and his interesting journey from Robotics to AI.
At Megagon Labs, we see bringing on interns as more than just hiring a short-term helping hand. As we welcome spring and summer interns, we’d like to share with you how we foster an environment of growth and career development for both mentors and interns.
Dan Zhang, Research Manager and Senior Research Engineer at Megagon Labs, gives us a recount of her journey from childhood to and through her career as a research engineer.
To help NLP researchers and practitioners understand and improve their data, we introduce Weedle, an exploratory text analysis tool for data-centric NLP. Here are Weedle’s biggest strengths…
Magneton is a framework for composing interaction history-aware and customizable widgets to enable transparent, reusable, and expressive data science workflows in computational notebooks.
We will introduce feature stores and examine the implications of deep learning on feature stores as well as discuss the role of feature stores as part of the emerging MLOps stack.
We introduce Sudowoodo, an end-to-end framework for a variety of data integration applications to resolve the limitations of data integration. Sudowoodo addresses the label requirement by leveraging contrastive learning to learn a data representation model from a large collection of unlabeled data items. This is realized by the contrastive objective that allows the model to learn how to distinguish pairs of similar data items from dissimilar ones that are likely to be distinct.
The goal of this workshop is to bring together the research communities from academia and industries related to these areas. These stakeholders are already interested in the development and the application of novel approaches/models/systems to address challenges around different matching tasks. While the workshop is intended to bring contributions to a wide range of topics, we will now discuss a few example research problems that might be of interest to the workshop audience.
In this blog post, we present MEGAnno, our flexible, exploratory, efficient, and seamless labeling framework for NLP researchers and practitioners. In short, MEGAnno aims to reduce costs while improving the quality of labeling.
2022 was a very productive year for our research team to bring forth many ideas to fruition as published papers in top conferences and workshops in the fields of natural language processing, machine learning, data Management, and human-computer interaction.
Megagon Labs researchers proposed a new CQA Summarization task focused on summarizing QA pairs in Community-based Question Answering. In addition, we developed a multi-stage annotation framework and created a benchmark CoQASum for the CQA Summarization task.
We identified two key designs that can improve the effectiveness and efficiency of sample acquisition: random sampling reduces the unlabeled pool being considered for acquisition, and decouples the diversity and uncertainty objectives in hybrid acquisition. Based on an investigation of existing methods, we propose a novel active learning method: TYROGUE.
In this blog post, we will define the problem of paraphrasing. We will explain the challenges of document-level paraphrasing, especially in the business domain. These challenges include evaluation. Following this, we will briefly describe the results of the survey study, and identify key ideas.
The ACM SIGKDD conference is the premier forum for the advancement, education, and adoption of computer science, specifically for knowledge discovery and data mining. Get an inside view of what happened at this year’s conference.
This article explains how to evaluate the models of hugginface/transformers systems using JGLUE, and evaluates the ELECTRA model used in GiNZA* by JGLUE.
NAACL is a key conference for our research work, as such 6 of our researchers attended the event and Megagon Labs was a Platinum level sponsor of the event. To share with you what we gathered from the conference, below we summarize papers, workshops, invited talks, and other conference events. We found the topics both interesting and relevant to the ongoing research at Megagon Labs.
The ACM SIGMOD conference is the leading forum for the principles, techniques and applications of database management systems and data management technology.
As we set out to build a set of powerful in-house interactive annotation tools for NLP/ML tasks, we wanted to share our lessons learned with the community on extending Jupyter notebooks with custom widgets.
This year, Dublin, Ireland hosted ACL 2022, a hybrid conference on computational linguistics (CL) and natural language processing (NLP). Our team sponsored and attended the conference. In this blog, we provide an overview of the invited talks and panel discussions. In addition, we discuss our top paper picks on information extraction, language understanding, prompting, language generation, and explainability, which are also relevant to ongoing research at Megagon Labs.
In this work, we investigate the generalizability of existing entity set expansion (ESE) methods to user-generated text as it is widely used in many real-world applications and is known to have more distinctive characteristics than well-written text.
In this blog post, I sum up my experience attending CHI, provide an overview of the keynotes and awards, and summarize several interesting papers on human-AI interaction, mixed-initiative system design, and visualization. Human-centered AI is a key research area of Megagon Labs where we explore challenges related to scalability, usability, and explainability in diverse projects such as data integration, natural language generation, and knowledge graphs. Therefore any research work at the intersection of NLP, data management, and HCI are of significant interest to us.
We use Generalized Entity Matching (GEM) to satisfy these practical requirements and present an end-to-end pipeline, Machop, as the solution. Machop allows end users to define new matching tasks from scratch and apply them to new domains in a step-by-step manner. Machop casts the GEM problem as sequence pair classification so as to utilize the language understanding capability of Transformers-based language models (LMs) such as BERT.
In this blog post, we take one step beyond the current scope of opinion summarization and propose CoCoSum, a framework which aims to generate contrastive and common summaries by comparing multiple entities. This framework consists of two base summarization models that jointly generate contrastive and common summaries.
Through our interviews and follow-up analysis, we provide a fine-grained characterization of information extraction workflows using a task-based model, identify related challenges faced by the human-in-the-loop, and propose design guidelines for addressing these challenges. We now discuss a few of the implications of the task model and our proposed design considerations for developing IE tools.
Research directions presented at EMNLP 2025 span agentic systems, retrieval, interpretability, multimodality, training, and human–AI interaction, including work contributed by Megagon Labs.
Explore the key takeaways from COLM 2025, including breakthroughs in Reasoning & RL, Multimodal LLMs, and Retrieval & Embedding, as highlighted by Megagon Labs research scientists and engineer.
We share Megagon Labs’ key takeaways from ACL 2025 — highlighting the trends, debates, and breakthroughs shaping the future of NLP, agentic AI, and trustworthy evaluation.
Stream processing is a key ingredient when making “agentic workflows” enterprise-ready. Streams support a wide range of workflows and support complexity while at the same time bringing about the right abstractions and scope for facilitating accuracy, scalability, and ease of use.
𝙠𝙉𝙉𝘽𝙀 is a hybrid model combines the efficiency of bi-encoders with the precision of k-NN lookups for skill mapping. With it, organizations can proficiently associate job descriptions with appropriate skills. By employing labeled synthetic sentences, 𝙠𝙉𝙉𝘽𝙀 enhances accuracy while maintaining speed, ideal for large-scale projects.
How can enterprise systems evolve to support agentic workflows? In this post, we explore the conceptual foundations of Blue—a framework designed to integrate AI agents, data, and services into scalable, observable, and controllable enterprise applications.
We present three new papers that tackle a pressing and underexplored topic in NLP: multi-document reasoning. These works offer rigorous benchmarks, novel methodologies, and empirical insights into how large language models (LLMs) handle complexity across multiple sources of information.