The 2022 Annual Conference of the North American Association for Computational Linguistics (NAACL-2022), was held in Seattle in mid-July and ran as a hybrid conference. NAACL-2022 accepted a total of 442 papers out of 2,103 submissions. Our work, “Low-resource Entity Set Expansion: A Comprehensive Study on User-generated Text” by Yutong Shao, Nikita Bhutani, Sajjadur Rahman, and Estevam Hruschka, was amongst the NAACL Findings accepted papers.
NAACL is a key conference for our research work, as such 6 of our researchers attended the event and Megagon Labs was a Platinum level sponsor of the event. Yutong Shao and Sajjadur Rahman presented a poster for the accepted paper on entity set expansion. We met great people at our booth, poster presentations, and talks. To share with you what we gathered from the conference, below we summarize papers, workshops, invited talks, and other conference events. We found the topics both interesting and relevant to the ongoing research at Megagon Labs.
NAACL Special Theme: Human-Centered NLP
This year’s NAACL special theme was Human-Centered NLP. In keeping with the theme, the first plenary talk was on value sensitive design of technological solutions. Two paper sessions, a poster session, a Birds-of-a-Feather meetup (BoF), and two workshops – Dynamic Adversarial Data Collection (DADC) and Bridging HCI and NLP (HCI+NLP) – were dedicated to the special theme.
Plenary Invited Talk by Batya Friedman
Friedman’s talk Shaping Technology with Moral Imagination: Leveraging the Machinery of Value Sensitive Design drew on her three-decade-long research on value sensitive design, which studies human values in the technical design process. Her presentation stressed the need for incorporating value sensitive design into the NLP tools that are increasingly mediating our day-to-day life. Friedman observed that technological solutions are often the result of our moral and technical imagination and may not always capture aspects such as diversity, equity, and inclusivity. For example, Twitter’s character limit represents a technical bias — not all languages enable thought expression in 140 characters. Friedman discussed a tripartite methodology for human-centered technology development where researchers iterate among three steps: conceptualization, empirical evaluation of impact, and solution design. There are direct and indirect stakeholders for any technological solution, and NLP practitioners need to take into consideration how solutions may impact indirect stakeholders as well as direct ones. Similar sentiment was also aired at the Human-Centered NLP BoF, where Dan Liebling from Google stressed the need to move beyond inputs and outputs. Instead, Liebling encouraged attendees to consider how an NLP model is incorporated in a system, including aspects such as interactivity and explainability.
Special Theme Paper Picks
The Why and The How: A Survey on Natural Language Interaction in Visualization
The authors employed a popular typology of abstract visualization tasks to NLP applications with specific focus on natural-language-based interaction in visualization. They surveyed 119 relevant papers and classified their goal to high-level visualization tasks (e.g. presentation vs. discovery tasks). Depending on the goals – to present or discover, as well as the data domain – practitioners can decide which NLP techniques to employ. The work also highlighted interesting areas for applying NLP methods in visualization.
An Exploration of Post-Editing Effectiveness in Text Summarization
While focusing on human-AI collaboration in the context of text summarization, the work investigates the impact of humans post-editing AI-generated texts, the pros and cons of such a mixed initiatives method, and user experience in such systems. The authors conducted an experiment with 72 human subjects, comparing AI-generated summaries with manual summarization. The authors observe that post-editing strategies are helpful when humans lack domain knowledge as they are equipped to edit AI-generated text using common sense. However, when AI-generated text contains factually inaccurate information, post-editing may not be helpful unless humans possess prior knowledge about the domain. The study also provided insights into user experience surrounding editing strategies.
NLP Applications Inspired by Real-World Challenges
We noticed an increased focus on supporting real-world scenarios and problems across various NLP tasks, e.g. handling outdated information in knowledge bases (KBs), scaling information extraction to a large number of entities in KBs, infusing structured information from KBs into pre-trained language models (LMs) and others.
FRUIT: Faithfully Reflecting Updated Information in Text
This paper proposes a novel generation task, FRUIT, whose goal is to incorporate new information into an existing piece of text. The task is relevant to many real-world applications in which information stored statically can become obsolete over time and needs to be updated. This task presents several new challenges because models (a) cannot obtain good performance by solely relying on their world knowledge acquired during pre-training, (b) have to produce new text that reflects both the original text and new evidence, (c) have to read and analyze evidence from both textual and tabular sources to determine relevant information. To facilitate research on this task, the paper released a dataset, FRUIT-WIKI, with 170k distantly supervised update-evidence Wikipedia-derived pairs. Experiments with baseline systems demonstrate the quality of the data and affirm the key challenges in the task including content selection and hallucinations.
GenIE: Generative Information Extraction
This paper proposes a novel autoregressive formulation for the closed information extraction task that can exploit the knowledge already encoded in pre-trained LMs and can capture fine-grained interactions expressed in the text. A key challenge in such a formulation is producing valid triplets. This paper uses a constrained decoding strategy that produces triplets consistent with the predefined schema of the KBs. The proposed approach achieves state-of-the-art performance on closed information extraction tasks, quickly adapts to new schemas, and can scale to a large number (~6M) of KB entities and relations. The paper also outlines how the decoding strategy can be extended to handle entities that are not in the KB, moving toward open information extraction.
Few-Shot Document-Level Relation Extraction
In contrast with existing approaches and benchmarks for sentence-level relation extraction, this paper proposes a few-shot document level relation extraction benchmark named FREDo. It claims moving few-shot relation extraction from sentence-level to document-level will emulate the none-of-the-above (NOTA) distribution more realistically. (NOTA refers to the case in which a candidate pair of entities does not hold any of the relations defined in the schema.) Both in-domain and cross-domain (need adaptation) tasks are presented in this work. Baseline solutions are presented, and there is opportunity for exploring new methods on this benchmark.
Machine Learning and Efficient Methods for NLP
This paper proposes a method – KroneckerBERT – for compressing pre-trained LMs by applying a Kronecker matrix decomposition. KroneckerBERT is a compressed version of the BERT_BASE model, obtained by compressing the embedding layer, the linear mappings in the multi-head attention, and the feed-forward network modules in the Transformer layers. The model is trained via an efficient two-stage knowledge distillation scheme. It uses far fewer data samples than state-of-the-art models like MobileBERT and TinyBERT, yet it archives comparable accuracy using only a fraction (13%) of model parameters.
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
This paper studies the nature and limits of pre-trained LM adaptation by constructing a synthetic task set benchmark called TASKBENCH500. Three axes of adaptability (ability to memorize, ability to generalize, and ability to fit to novel distributions) in two adaptation paradigms (fine-tuning and prompt-tuning) are considered. 500 procedurally generated tasks are involved to answer larger and structural questions about LM adaptation. This paper lists a few interesting findings on designed criteria based on the benchmark’s experimental results, and it will be quite useful to study multiple adaptation factors of LMs moving forward.
Extreme Zero-Shot Learning for Extreme Text Classification
This paper studies a scenario called Extreme Zero-Shot XMC for eXtreme Multi-label text Classification (XMC) problem. Existing approaches are usually hard to generalize and suffer from insufficient labels. This paper develops a pre-training method that leverages the raw text with multi-scale adaptive clustering, label regularization, and self-training with pseudo-positive pairs. The experimental findings show leading results on both unsupervised learning setups (zero-shot) and semi-supervised learning setups (few-shot).
Text Generation
Text generation was one of the hot topics at NAACL. Here we present outlines of a tutorial on text editing and two best papers related to text generation.
Text Generation with Text-Editing Models Tutorial
The problem of text editing and its main applications include grammar-error correction and text simplification. While this problem has been solved by generating output text from scratch, this tutorial introduces some methods that take advantage of a characteristic of text editing tasks: There is a large overlap between the input and output. One of them is EdiT5, proposed by the speakers, which leverages the pre-training task of the T5 model to select text to be deleted from the input text. It can be reordered with a pointer network, and then only the text to be inserted with a decoder can be regenerated. The researchers showed that this improves latency by 14.5 times compared to the conventional Seq2Seq model, all while maintaining high accuracy.
Automatic Correction of Human Translations
This paper proposes a new task that automatically corrects human-written translation by machines, in contrast to previous post-editing of machine translation systems that aims to further improve machine-generated translations. The authors illustrate that human-translation errors can be divided into three main categories: monolingual, bilingual and preferred editing. Based on this error analysis, they proposed a synthetic training dataset to build a human-error correction model, and experimental results show that the editing model trained with the synthetic dataset shows better performance than the post-editing model. They also conducted a user survey which found that the editing model allows human editors to better revise translations with less effort.
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
One common decoding method for text generation is left-to-right decoding, but it is known that it only considers the past generations, not future ones. This study proposes a novel algorithm called A*esque Decoding that incorporates the future cost, using classical A* search into left-to-right decoding. The researchers experimentally showed that the proposed method significantly improves existing models in a variety of ways such as machine translation and data-to-text generation.
NLP Debates: Linguistics and Symbolic Structures
The panel, The Place of Linguistics and Symbolic Structures chaired by Dan Roth, was one of two NAACL panels this year. It was motivated by the widespread adoption of neural models in NLP research and the increased impact NLP applications have on our lives. The goal of the panel discussion was to prompt debates on what productive directions in NLP research might look like. The panel highlighted one core aspect of the debate, namely the role linguistics and symbolic structures can play, or not, in shaping research directions.
During the panel, Chitta Baral highlighted the need for the NLP community to move beyond ‘solving the dataset’ and instead pursue ‘solving the task’ with the potential help of neuro-symbolic approaches. Emily Bender stressed that linguistics can help improve NLP as an application area and outlined how established scholarship in subfields such as structural linguistics, linguistic pragmatics, child language acquisition research, linguistic typology, and sociolinguistics can help ground NLP tasks. In a similar vein, Christopher Manning noted that neural models scale better and can capture the world represented by symbols, but linguistics is the right tool for understanding NLP systems in terms of their goals, analyses, and evaluations.
Finally, Dilek Hakkani-Tür commented on the numerous challenges in dialog systems research, despite the effectiveness of LLMs in the area. Notably, these include incorporating context and grounding response in (structured) knowledge such as KBs or structured resources (e.g., articles, reviews, etc.)
Conclusion
With that, we hope this gives you a good overview of the NAACL conference from the perspective of our research interests at Megagon Labs. Conferences such as NAACL help us see research trends and help us return to work inspired and excited. As the field advances, it is great to see the community grow and become more inquisitive.
To read a summary of our NAACL Findings accepted paper, here’s a blog post: Low-resource Entity Set Expansion on User-generated Text: Insights and Takeaways.
Written by: Nikita Bhutani, Hayate Ito, Sajjadur Rahman, Chen Shen, Nedelina Teneva, and Megagon Labs