Natural Language Processing

The ability to automatically, consistently, and correctly understand (and extract) information from textual sources is a key characteristic of many real-world AI applications. This trait is similarly critical in the Human Resources (HR) domain. Current state-of-the-art large pre-trained language models have recently demonstrated impressive performance on a wide range of NLP tasks, including natural-language generation, summarization, question answering, reading comprehension, and named entity recognition/resolution. But they have also shown limitations in areas like interpretability, controllability, transparency, and fairness.

At Megagon Labs we focus on how to take advantage of large pre-trained language models and go beyond the current state of the art. We work on the investigation, proposal, and deployment of new models, systems, and approaches that boost natural language processing capabilities. We do this by defining new architectures, using hybrid neuro-symbolic paradigms, and exploring domain-specific characteristics that positively impact the quality, consistency, fairness, and truthfulness of our solutions on HR and related domains.

Related Projects:

Coop: Convex Aggregation for Opinion Summarization

We developed Coop, a tool that enables us to generate more specific summaries by finding better summary vector in the latent space.

CoCoSum: Contrastive Summary for Two Comparable Entities

We developed a novel decoding algorithm, co-decoding. For the distinctive opinion summary generation, it emphasizes the distinctive words by contrasting the token probability distribution of the target entity against that of the counterpart entity. For the common opinion summary generation, it highlights the entity-pair specific words by aggregating token probability distributions.

GiNZA

GiNZA is an open-source Japanese NLP library with features such as a one-step installer, high-speed and high-precision analysis, and international capabilities for sentence structure analysis.