Extracting structured knowledge, like entities and relations from unstructured text, is a fundamental challenge in natural language processing. Triplet extraction is a particularly challenging task in information extraction, where the goal is to derive triplets directly from the raw text. For example, extracting the triplet (Hayao Miyazaki, born_in, Tokyo) from the sentence “Hayao Miyazaki was born in Tokyo.”
Most triplet extraction models require training data that comprehensively covers all the target relations they need to extract. However, in real-world applications, we often want to extract new unseen relations at test time, for which we have no training examples. This is known as zero-shot triplet extraction.
Handling unseen relations is crucial for tasks like knowledge-base population, where new entity types and relations continuously emerge over time. However, generating reliable training data for unseen relations is notoriously difficult. Typically it requires the creation of synthetic examples via noisy methods like distant supervision, which rely on heuristics to associate entities based on co-occurrence. This however can propagate erroneous examples into the training data.
Recent progress in pre-trained language models (PLMs) like T5 has shown promise for zero-shot learning. The key idea is to reformulate tasks into a format that matches the PLM’s training objective. This allows the model to generalize better based on its pre-trained knowledge without requiring task-specific fine-tuning.
Our new paper builds on this approach to develop a novel method for zero-shot triplet extraction, called ZETT (Zero-shot Triplet Extraction via Template Infilling). Previous work on zero-shot extraction still requires generating synthetic training data for unseen relations. Instead, our method ZETT avoids this step entirely.
Overview of ZETT
The key idea in ZETT is to frame triplet extraction as a template infilling task. For each relation, we use a template containing placeholders for the head and tail entities. For example:
“<X> was born in <Y>.”
We fine-tune the pre-trained language model on seen relations by masking the entity placeholders and training it to generate the spans given context. At test time, we simply provide templates for the unseen relations given context and get the model to fill in entities:
“[Hayao Miyazaki] was born in [Tokyo].”
By converting extraction into template infilling, ZETT aligns the task with the pre-training objective of generative LMs like T5. This allows zero-shot generalization without specialized techniques.
In our experiments on FewRel and Wiki-ZSL datasets, we found ZETT outperforms prior state-of-the-art methods by 5-6% in accuracy. It also shows more stable performance compared to methods relying on noisy synthesized data.
Key Benefits of ZETT
Some of the major advantages of ZETT include:
- No extra training data needed for unseen relations: Since the method is aligned with the model’s pre-training task, no additional labeled data is required.
- Leverages knowledge encoded in the templates: The templates provide useful inductive biases about entity types and their ordering.
- Avoids noise from synthetic data: Bypassing synthetic data generation improves stability over prior work.
- Easy to deploy: The approach simply fine-tunes a standard pre-trained LM without complex components.
The prompting formulation makes it straightforward to inject useful biases into the model based on the pre-training. This enables greater sample efficiency and generalization capability compared to prior work.
Overall, ZETT provides a simple yet powerful new approach for zero-shot extraction. The method also has promising implications for handling emerging entities and relations in knowledge base construction.
In this blog, we introduced ZETT, demonstrating a viable prompting-based approach for zero-shot triplet extraction. By reformulating extraction as template infilling, the method can generalize to unseen relations without synthetic training data. Our experiments show ZETT advances state-of-the-art extraction accuracy while providing a conceptually simple and stable solution. Going forward, we believe methods like ZETT that leverage self-supervised pre-training will play a key role in adapting information extraction to open-domain settings.
We released the codebase to reproduce this study. You can find it here: https://github.com/megagonlabs/zett