LLM & NLP

Breakthroughs in LLMs have shifted NLP from task-specific methods to a generalized, data-driven approach, revolutionizing research and applications. Modern LLMs are increasingly being integrated with external tools, such as search engines, APIs, or symbolic reasoning systems to tackle complex tasks requiring specialized knowledge. However, their rise in usage has highlighted challenges in fairness, controllability, transparency, and explainability, which are especially critical qualities in domains like HR, legal, finance, and healthcare.

At Megagon Labs, we strive to harness the potential of LLMs while addressing these limitations. Our research focuses on three key areas: 

  1. Understanding LLM Behavior and Limitations: Investigating how LLMs perform and the challenges they face in real-world production use cases.
  2. Advancing LLM Capabilities: Developing novel systems, hybrid neuro-symbolic approaches, and domain-specific innovations to enhance LLM performance.
  3. Robust Evaluation Methods: Creating effective methods to assess LLMs on complex, real-world tasks, ensuring their reliability and effectiveness in diverse applications.

By leveraging these techniques, we aim to improve the quality, consistency, fairness, and truthfulness of AI solutions tailored for HR and related domains, driving impactful progress in both research and practical applications. Our work encompasses fundamental research, applied projects, and open-source contributions, ensuring that our innovations make a meaningful impact both within and beyond the lab.

Highlighted

Projects

We benchmark and investigate to understand when retrieval enhances LLM performance and when it may hinder it. Our insights contribute to the development of a reliable, retrieval-augmented language model-based QA system.

Investigation into LLM’s sensitivity in multiple-choice question answering – a task commonly used to study the reasoning and fact-retrieving capabilities of LLMs.

 

 

AmbigNLG

Addressing ambiguity in natural language generation (NLG) instructions by identifying unclear specifications and refining them for better output quality.

Less Is More Abstract

An innovative approach, “Extract then Evaluate,” to evaluate long document summaries using LLMs that not only significantly reduces evaluation costs but also aligns more closely with human evaluations.

 

Related

Publications

NAACL - Industry
2025
Advances in Natural Language Processing (NLP) have the potential to transform HR processes, from recruitment to employee management. While recent breakthroughs in NLP have generated significant interest in its industrial applications, a comprehensive overview of how NLP can be applied across HR activities is still lacking. This paper discovers opportunities for researchers and practitioners to harness NLP’s transformative potential in this domain. We analyze key fundamental tasks such as information extraction and text classification, and their roles in downstream applications like recommendation and language generation, while also discussing ethical concerns. Additionally, we identify gaps in current research and encourage future work to explore holistic approaches for achieving broader objectives in this field.
NAACL
2024
While large language models (LMs) demonstrate remarkable performance, they encounter challenges in providing accurate responses when queried for information beyond their pre-trained memorization. Although augmenting them with relevant external information can mitigate these issues, failure to consider the necessity of retrieval may adversely affect overall performance. Previous research has primarily focused on examining how entities influence retrieval models and knowledge recall in LMs, leaving other aspects relatively unexplored. In this work, our goal is to offer a more detailed, fact-centric analysis by exploring the effects of combinations of entities and relations. To facilitate this, we construct a new question answering (QA) dataset called WiTQA (Wikipedia Triple Question Answers). This dataset includes questions about entities and relations of various popularity levels, each accompanied by a supporting passage. Our extensive experiments with diverse LMs and retrievers reveal when retrieval does not consistently enhance LMs from the viewpoints of fact-centric popularity. Confirming earlier findings, we observe that larger LMs excel in recalling popular facts. However, they notably encounter difficulty with infrequent entity-relation pairs compared to retrievers. Interestingly, they can effectively retain popular relations of less common entities. We demonstrate the efficacy of our finer-grained metric and insights through an adaptive retrieval system that selectively employs retrieval and recall based on the frequencies of entities and relations in the question.
7 Min Read
November 8, 2024
AmbigNLG tackles ambiguity in Natural Language Generation (NLG) instructions by identifying unclear specifications and refining them for better output quality.
5 Min Read
June 13, 2024
Explore the relationship between option arrangement and performance variations in Large Language Models (LLMs) during multiple-choice tasks. Through meticulous analysis, we uncovered substantial sensitivity of LLMs to the order of answer options, with performance fluctuations of up to 75% across different benchmarks.
4 Min Read
June 6, 2024
The article presents the WiTQA dataset, designed to assess the impact of retrieval on the performance of language models in question-answering systems. It details the findings on when retrieval augmentation enhances QA accuracy and when it may introduce errors, providing valuable guidance for optimizing RALMs.