AmbigNLG: A Tutorial

AmbigNLG is a means of tackling ambiguity in natural language generation (NLG) instructions by identifying unclear specifications and refining them for better output quality. In this tutorial, we will walk you through using AmbigNLG to:

  • Identify ambiguous aspects of an instruction.
  • Refine ambiguous instructions for more effective text generation.
  • Generate and compare output texts with and without instruction refinement.
  • Deploy interactive ambiguity mitigation to improve downstream tasks.

Instead of exactly replicating the experiments from the AmbigNLG paper, we’ve adopted the best practices in large language models (LLMs) to implement ambiguity mitigation interactively, guiding you through deploying AmbigNLG’s concepts for your applications.

Introduction

AmbigNLG is designed to address task ambiguity in NLG instructions. This tutorial shows how to use AmbigNLG to enhance the clarity of instructions and optimize the performance of LLMs in complex NLG tasks.

Ambiguous instructions can significantly affect output quality, leading LLMs to generate diverse responses that may not align with the user’s intent. AmbigNLG provides a structured approach to detect and resolve these ambiguities, ultimately reducing output variability.

Let's Get Started

Setup

To begin, set up the environment by installing the necessary dependencies. We’ll be using the OpenAI API.

				
					# Install dependencies
!pip install openai py-rouge jinja2 > /dev/null

import openai

api_key = "your-openai-api_key"
# or
import os
api_key = os.getenv("OPENAI_API_KEY")
# initiate openai client
client = openai.OpenAI(api_key=api_key)

				
			

Generating Text with Ambiguous Instructions

Let’s explore a common NLG task: generating article summaries. Consider the task of summarizing an article—a seemingly straightforward task that can produce widely varying results due to ambiguous instructions.

Task Goal: Generate a summary of an article about sushi that captures its key information.

Initial Instruction: “Summarize this article!”

This simple instruction, while clear at first glance, contains several ambiguities:

  • How long should the summary be?
  • Should it focus on history, preparation methods, or cultural significance?
  • What style should it use (academic, casual, technical)?

Let’s see how this ambiguity affects the output:

				
					import requests
from bs4 import BeautifulSoup

def generate_text(instruction: str, input_text: str, model: str = "gpt-4o-mini-2024-07-18"):
    prompt = f"""
Below is an input text that provides further context, paired with an instruction that describes a task. Provide a direct response that appropriately completes the request without additional explanations or details.
# Input text:
{input_text}

# Instruction:
{instruction}

# Response:
"""

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        n=10,
        top_p=0.9
    )
    return [choice.message.content for choice in response.choices]


sample_url = "https://en.wikipedia.org/wiki/Sushi"
sample_instruction = "Summarize this article!"
sample_input = BeautifulSoup(requests.get(sample_url).text).get_text(strip=True)
sample_outputs = generate_text(sample_instruction, sample_input)

				
			

Sample Output 1: Sushi is a traditional Japanese dish that consists of vinegared rice combined with various ingredients, primarily seafood, which may be raw or cooked. Its origins can be traced back to several ancient forms, including narezushi, a fermented fish dish, and has evolved through various styles, notably nigirizushi, created in the Edo period. Sushi can be served in numerous forms, including chirashizushi (scattered sushi), inarizushi (tofu pouches filled with rice), and makizushi (rolled sushi). The dish has gained global popularity, leading to Western adaptations such as the California roll and various fusion styles. Sushi is typically accompanied by condiments like soy sauce, wasabi, and pickled ginger. It is generally considered nutritious but can pose health risks related to raw fish consumption, including parasites and mercury contamination. The presentation of sushi varies, often served on minimalist plates, and is a central feature in Japanese dining culture.

Sample Output 2: Sushi is a traditional Japanese dish consisting of vinegared rice combined with various ingredients, most commonly seafood, vegetables, and sometimes meat. Its history dates back to ancient preservation techniques involving fermented fish and rice, evolving through different styles such as narezushi, namanare, and hayazushi. The modern form, nigirizushi, was popularized in the Edo period by chef Hanaya Yohei. Sushi can be categorized into various types, including chirashizushi (scattered sushi), inarizushi (tofu pouches), makizushi (rolled sushi), and nigirizushi (hand-pressed sushi). Sushi is often served with condiments like wasabi and soy sauce and is enjoyed globally, with variations like the California roll emerging in Western cuisine. Nutritionally, sushi is low in fat and high in protein, though there are health risks associated with consuming raw fish, such as parasites and mercury contamination. Presentation and etiquette in sushi dining are also significant aspects of the culinary experience.

Sample output 1 emphasizes sushi’s global adaptations and health aspects, while sample output 2 dives deeper into specific historical development and cultural practices.

Outputs often vary due to ambiguities in the instruction, such as insufficient detail on the desired summary’s length, focus, or style.

Understanding the Ambiguity Taxonomy

Through the meticulous evaluations of real-world textual instructions, we’ve introduced an ambiguity taxonomy for NLGs with six main categories: 

Category Description
Context
Choose this category if the instruction lacks the required contextual information, such as background or external knowledge crucial for task completion. Resolving this ambiguity will provide the crucial context for the task.
Keywords
Select this category if the instruction does not mention specific keywords to be used in the output text. Resolving this ambiguity will ensure that the necessary keywords are incorporated in the output.
Length
Opt for this category if the instruction does not provide specifics about the desired length of the output, whether in terms of words or sentences. Clearing up this ambiguity will lead to a more precise length output.
Planning
Select this category if the instructions don’t provide guidance on content planning for the output document. Resolving this ambiguity will result in the desired structured output.
Style
Choose this category if the instruction does not specify the style of the output text. Clearing this ambiguity will ensure that the output aligns with the desired style.
Theme
Choose this category if the instruction does not clearly define the specific theme to be discussed in the output text. Clearing this ambiguity will provide a clear direction for the output.

Based on this ambiguity definition, we can identify the type of ambiguity in instruction and then mitigate ambiguity.

Identifying Ambiguity

We can use an LLM to identify the ambiguity categories present in an instruction.

				
					from pydantic import BaseModel
from enum import Enum


class AmbiguityCategory(str, Enum):
    context = "Context"
    keywords = "Keywords"
    length = "Length"
    planning = "Planning"
    style = "Style"
    theme = "Theme"
    none = "None"


class IdentifiedAmbiguities(BaseModel):
    ambiguity_categories: list[AmbiguityCategory]


def identify_ambiguity(instruction: str, input_text: str, model: str = "gpt-4o-mini-2024-07-18"):
    prompt = f"""Your task involves identifying the category of ambiguity in the given instruction to generate output text from the given input text. Ambiguity in instruction means that there are several possible output texts from the single input text. On the other hand, when the ambiguity is clarified, the task becomes straightforward, leading to a nearly single output.
Here are the available categories: Context, Keywords, Length, Planning, Style, Theme.
* Context: Choose this category if the instruction lacks the required context information, such as background or external knowledge crucial for task completion. Resolving this ambiguity will provide the crucial context for the task.
* Keywords: Select this category if the instruction does not mention specific keywords to be used in the output text. Resolving this ambiguity will ensure that the necessary keywords are incorporated in the output.
* Length: Opt for this category if the instruction does not provide specifics about the desired length of the output, whether in terms of words or sentences. Clearing this ambiguity will lead to a more precise length output.
* Planning: Select this category if the instructions don't provide guidance on content planning for the output document. Resolving this ambiguity will result in the desired structured output.
* Style: Choose this category if the instruction does not specify the style of the output text. Clearing this ambiguity will ensure that the output aligns with the desired style.
* Theme: Choose this category if the instruction does not clearly define the specific theme to be discussed in the output text. Clearing this ambiguity will provide a clear direction for the output.
* None: Choose this category if none of the above apply.

# Input Text:
{input_text}

# Instruction:
{instruction}

# Response:"""
    response = client.beta.chat.completions.parse(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.,
        response_format=IdentifiedAmbiguities
    )

    return [cat.value for cat in response.choices[0].message.parsed.ambiguity_categories]

ambiguities = identify_ambiguity(sample_instruction, sample_input)
print("Identified Ambiguities:", ambiguities)
Identified Ambiguities: ['Length', 'Context', 'Theme']

				
			

The LLM identified ambiguities in the instructions relating to “Length,” “Style,” and “Theme.” This gives us a clear understanding of the missing elements that need refinement to make the instruction more specific.

Instruction Refinement

Based on the ambiguities identified, we generated additional instructions to resolve them. For instance, specifying the desired output length (e.g., “Summarize in 100 words”), defining the style (e.g., “Write in a formal style”), and narrowing the theme (e.g., “Focus on sushi’s global adaptations and health aspects”) can lead to a more refined instruction. Users can then pick among the generated options or add their own specific instructions.

				
					from jinja2 import Template


class AdditionalInstructions(BaseModel):
    category: AmbiguityCategory
    additional_instructions: list[str]


def generate_additional_instructions(instruction: str, input_text: str, ambiguity_category: str, model: str = "gpt-4o-mini-2024-07-18"):
    template = Template("""To resolve the specified ambiguity in the instruction, provide multiple additional instructions as the infilled templates. Each additional instruction strictly adheres to the template format.
Ensure this added information aligns with the primary objective of the task, supports understanding of complex concepts, or aids in narrowing down the scope to generate more precise responses.

# Input Text:
{{ input_text }}

# Instruction:
{{ instruction }}

# Ambiguity to Resolve:
{{ ambiguity_category }}

# Template to Infill:
{% if ambiguity_category == 'Context' %}Additional context: <paragraph>
{% elif ambiguity_category == 'Keywords' %}Include <keywords> in your response.
{% elif ambiguity_category == 'Length' %}Answer with <number> words.
{% elif ambiguity_category == 'Planning' %}Please generate the output based on the following outline: 1. <topic1> 2. <topic2> ...
{% elif ambiguity_category == 'Style' %}Write in a <style> style.
{% elif ambiguity_category == 'Theme' %}Primarily discuss the following theme: <theme>
{% else %}No template found for this category.
{% endif %}""")

    prompt = template.render(
        input_text=input_text,
        instruction=instruction,
        ambiguity_category=ambiguity_category
    )
    response = client.beta.chat.completions.parse(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
        response_format=AdditionalInstructions,
    )

    return response.choices[0].message.parsed.additional_instructions


def get_user_selected_instruction(candidates: list[str]):
    print("Generated candidate instructions:")
    for idx, candidate in enumerate(candidates, start=1):
        print(f"{idx}: {candidate}")

    user_input = input(f"\nSelect an instruction (1-{len(candidates)}), or type 'skip' to skip, or provide your own: ")

    if user_input.isdigit():
        selection = int(user_input)
        if 1 <= selection <= len(candidates):
            print(f"Selected instruction: {candidates[selection - 1]}")
            return candidates[selection - 1]
    elif user_input.lower() == 'skip':
        print("Skipping instruction selection.")
        return None
    else:
        print(f"Custom instruction provided: {user_input}")
        return user_input

    return None


def refine_instructions(sample_instruction: str, sample_input: str, ambiguities: list[str]):
    additional_instructions = []

    for category in ambiguities:
        category = category.strip()
        print(f"\nSuggested additional instructions for {category}:")

        candidates = generate_additional_instructions(sample_instruction, sample_input, category)
        selected_instruction = get_user_selected_instruction(candidates)

        if selected_instruction:
            additional_instructions.append(selected_instruction)

    refined_instruction = "\n".join([sample_instruction] + additional_instructions)

    return refined_instruction

refined_instruction = refine_instructions(sample_instruction, sample_input, ambiguities)
print("\nFinal Refined Instruction:\n")
print(f"```{refined_instruction}```")

Suggested additional instructions for Length:
Generated candidate instructions:
1: Answer with 100 words.
2: Answer with 200 words.
3: Answer with 300 words.
Selected instruction: Answer with 100 words.

Suggested additional instructions for Context:
Generated candidate instructions:
1: Provide a brief overview of the history of sushi, highlighting its evolution from narezushi to modern forms like nigirizushi and conveyor belt sushi.
2: Summarize the different types of sushi mentioned in the article, including chirashizushi, inarizushi, makizushi, and their regional variations.
3: Explain the significance of sushi in Japanese culture, including its traditional preparation methods and etiquette associated with eating sushi.
Selected instruction: Explain the significance of sushi in Japanese culture, including its traditional preparation methods and etiquette associated with eating sushi.

Suggested additional instructions for Theme:
Generated candidate instructions:
1: Primarily discuss the following theme: The historical evolution of sushi from its origins to modern variations.
2: Primarily discuss the following theme: The different types of sushi and their unique characteristics.
3: Primarily discuss the following theme: The cultural significance of sushi in Japanese cuisine and its global influence.
4: Primarily discuss the following theme: The ingredients used in sushi and their preparation methods.
Skipping instruction selection.

Final Refined Instruction:

```Summarize this article!
Answer with 100 words.
Explain the significance of sushi in Japanese culture, including its traditional preparation methods and etiquette associated with eating sushi.```
				
			

Evaluating the Effect of Refinement

Once we have a refined instruction, we can use it to generate a more focused output.

				
					refined_outputs = generate_text(refined_instruction, sample_input)

print("Refined Output:", refined_outputs[0])

				
			

Refined Output: Sushi is a traditional Japanese dish featuring vinegared rice combined with various ingredients, predominantly seafood, which can be raw or cooked. Its evolution includes styles like nigiri-zushi and makizushi, highlighting regional variations and historical influences. Sushi preparation emphasizes the quality of rice and fish, with techniques refined over centuries. Etiquette dictates that sushi is typically eaten by hand, with a focus on the harmony of flavors. Its cultural significance lies in its representation of Japanese culinary artistry and seasonal ingredients, making sushi not just a meal, but an expression of tradition and craftsmanship.

The refined output is now more focused and closely aligned with the user’s expectations. By clarifying the length, style, and theme, the instruction ensures that the generated text is more precise and on-point. Instead of covering a range of topics, like sushi’s global adaptations and health aspects, the refined output zeroes in on the cultural significance of sushi in Japanese cuisine and its global influence—exactly as instructed. This results in a response that’s not only relevant but also written in a formal style and kept within the 100-word limit. These refinements help eliminate ambiguity, producing a more consistent and accurate outcome that is a better match for what the user had in mind.

To quantify the impact of refining instructions, we compare the diversity of generated outputs between the sample_instruction and refined_instruction. If the refinement reduces output diversity, it indicates that the instructions have become clearer and more specific, narrowing the output space and producing more focused results.

				
					from itertools import combinations
import rouge

evaluator = rouge.Rouge(metrics=["rouge-l"], limit_length=False, apply_avg=True, stemming=True,)
sample_score = evaluator.get_scores(*zip(*combinations(sample_outputs, 2)))["rouge-l"]["f"]
refined_score = evaluator.get_scores(*zip(*combinations(refined_outputs, 2)))["rouge-l"]["f"]
print("Diversity Score (Original):", sample_score)
print("Diversity Score (Refined):", refined_score)

Diversity Score (Original): 0.43312789543767966
Diversity Score (Refined): 0.38057679080136764

				
			

The diversity score of the original outputs (sample_score) is significantly lower than that of the refined outputs (refined_score). A lower diversity score among refined outputs suggests that ambiguity has been reduced, leading to more consistent responses.

Conclusion

In this tutorial, we demonstrated how to use AmbigNLG to identify ambiguities in instructions and refine them to achieve better alignment in LLM-generated outputs. By applying these methods, you can enhance the specificity of your instructions, thereby improving the quality and consistency of generated content for your downstream tasks.

For more details, refer to the AmbigNLG paper, accepted at EMNLP 2024 and access the GitHub repository here.

 

Written By: Hayate Iso and Megagon Labs

Share:

More Blog Posts: