Supporting Humans in the Information Extraction Loop: An In-depth Study of the Practices, Limitations, and Opportunities

Information extraction (IE) is often the crucial first step in text-analysis tasks such as entity matching, knowledge-base population, and text summarization. Typically, data science workflows such as information extraction, are characterized as a sequence of phases (see Figure 1), often with significant human involvement throughout. However, to improve the experience of those humans involved in the information extraction loop, we need to thoroughly investigate the workflows and examine finer-grained tasks in each of these phases.

phases of

Fig. 1: Phases of a typical information extraction workflow.

In this blog post, we summarize our findings from an interview study that aimed to characterize tasks and actions across all phases of information extraction, from data preparation to model deployment. We examined 10 internal projects with diverse set of tasks, all involving information extraction. Those tasks included entity extraction, knowledge-base population, and natural language generation. Our ultimate goal is to examine challenges and obtain feedback to improve the experience of existing tools. More detailed findings can be found in our recently published paper at the premier ACM Conference on Human Factors in Computing, CHI’22.

A Deeper Dive into Existing Practices

So what did we learn about practitioners’ IE workflows? Somewhat surprisingly, we observed in each of these phases, users repeatedly performed five classes of tasks: view, assess, hypothesize, pursue, and verify. Let’s see an example of how this task model applies to the data preparation phase. 

Let us consider a scenario where a user is extracting aspects and opinions from reviews (see animated Figure 2). First, the user views samples of reviews in spreadsheets (view). Then, to understand data quality issues, they might cluster the review sentences to see if any patterns emerge (assess). For example, some reviews may be written in a different language or may contain typos or html tags. Based on the observation, the user formulates their hypothesis for cleaning the reviews and defines rules such as removing html tags and discarding non-English text (hypothesize). Then the user runs the cleaning scripts (pursue) and finally evaluates the data quality to see if the hypothesis was correct or not (verify).

Fig. 2: User tasks performed in the data preparation phase.

Regardless of the workflow, we observed that this task model emerges across all phases! Let’s analyze the task model further.

The Many Loops of IE

Interestingly, as we analyzed user tasks and actions across various phases, we realized that the task model can be iterative (see Figure 3). Users would often iterate several times; they would go back to cycle through view, access, hypothesis, pursue, and verify. For example, going back to our example of aspect and opinion extraction from reviews: In the model building phase, users first explore the data to formulate extraction rules (hypothesis), and then confirm their hypothesis via verification task. If the extraction performance is poor, the users may revise their hypothesis by further exploring the data and repeating the confirmation step.

Fig. 3: The exploration-confirmation loop.

The exploration-confirmation process is repeated until users are satisfied with the outcome. When we factor in the iteration among phases, the information extraction workflow becomes highly cyclical, where users not only continuously context switch between various tasks but also between different phases as shown in Figure 4.

sensemaking

Fig. 4: The many loops of IE.

To formally analyze the task model, we used Grounded Theory [1], a systematic methodology which involves the construction of hypotheses and theories through the collecting and analysis of data. Based on the analysis, we identified 17 unique user actions (e.g., sample data, get overview, get details on demand, compare views, document observations). We further categorized the low-level operations, performed by users on existing IE tools, into these user actions. For example, operations such as clustering and computing feature distributions correspond to the user action overview and are typically used during data preparation and model evaluation phases. We refer the readers to our paper for more details regarding the categorization of tasks, user actions, and operations. We now discuss some of the challenges in performing various operations within the task model related to exploration, confirmation, and iteration.

Challenges for the Human in the IE Loop

During the retrospective interviews, we asked participants about various pain points within their workflows. We grouped their responses into three challenge categories: a) exploration or foraging, b) confirmation or sensemaking, and c) iteration. We focus on a few of the pain points corresponding to these challenges next.

Foraging Challenges

Perceptual scalability. While exploring a text corpus, a key challenge is scale, since a massive dataset makes it difficult for the users to comprehend the data. The onus is on the users to tediously sample data or scroll over documents to understand the data. 
“…It was difficult for us to explore the fraction of the data
Lack of semantic search capabilities. Information extraction over text requires users to perform advanced search features which existing systems lack. For example, there is no default support for searching documents by parts of speech tags or synonyms of a word
“… So for example we want to label everything with salary into benefits. There are many synonyms of salary. Especially I need to enumerate the synonyms myself… because Google Sheets doesn’t provide that functionality….”

Lack of context. While exploring the data, users often fail to draw insights due to a lack of context. For example, in the review data preparation example presented earlier, users clustered the review sentences to identify data quality issues. However, there is no interactive feature to show example sentences of each cluster. Doing this could have helped users in understanding the general pattern within each cluster.

“… show the results and let you explore those results interactively. I think even that is a really useful feature.”

Sensemaking Challenges

Difficulty in qualitative validation. During hypothesis verification, qualitative validation can be challenging, as existing tools do not automatically retrieve the source document of an extraction. An example of this would be showing the review sentences from which aspects and opinions were extracted.

“… what we have to do, which is actually very tedious, is to get an extraction, go back and see where it came from.”

Lack of comparison features. While evaluating models or extraction rules, there is no built-in support for comparing their outcomes. For example, mean average precision is a common metric used to measure model performances. Currently, the onus is on the users to vary the number of extractions, k, output by different candidate models and then compare their mean average precision.
“… say we have 10 different models. And then I’m pretty sure I like to do analysis in an interactive manner, so that I can pick up the best model for the purpose… a plot where you can select parameters and their values and see how the metrics change.”

Iteration Challenges

Bespoke provenance management. Existing tools do not instrument provenance and metadata management as built-in features. Instead, they put the responsibility on the user to develop solutions for validation, documentation, and comparison across iterations. Current practices for tracking quantitative (e.g., evaluation results, model parameters) and qualitative (e.g., user comments, documentation) metadata of an IE project are tedious and prone to errors. 

Lack of a holistic system. Another challenge is the lack of an end-to-end solution that supports all the five classes of tasks. For example, spreadsheets are used for viewing data (view), labeling patterns (hypothesize), and computational notebooks may be used across all the tasks. Since both the phases and tasks are iterative, users are often forced to move back and forth between multiple tools as they accomplished their IE workflow. Toggling between tools can be cumbersome for users. 

So how can we address these challenges? In the next part of our discussion, we outline several design guidelines for developing human-in-the-loop IE tools based on principles of cognitive engineering.

Cognitive Engineering Principles to the Rescue!

Widely used to evaluate human-in-the-loop interfaces in various domains, cognitive engineering principles [2] leverage empirical findings from the cognitive sciences to inform the design of an interface. In our work, we focus on the following principles: automating unwanted workload (CP1), reducing uncertainty of information (CP2), fusing data to provide high-level abstraction (CP3), using known metaphors for ease of interpretation (CP4), displaying information in a logical manner (CP6), providing visual aids during information seeking (CP7), maintaining context of current focus (CP8), and presenting information at multiple levels of detail (CP9). We observed that many of the challenges discussed earlier occur because existing tools do not adhere to these principles.

Table 1. Design guidelines for IE tools inspired by cognitive engineering principles.

To address these challenges, we identified several design guidelines for information extraction tools that were inspired by these principles. Table 1 captures these guidelines while showing how they relate to various cognitive engineering principles. These guidelines can be grouped into two themes: feature-level (D1-D6) and system-level guidelines (D7, D8). Most of the feature-level guidelines are related to automating users’ unwanted workloads using an intelligent agent that may perform semantic searches, generate automated summaries, or provide interactive feedback. The system-level guidelines recommend developing end-to-end solutions for ease-of-use (D7) and propose integration of provenance and metadata management mechanisms to ensure reproducibility (D8).

Implications and Future Work

Through our interviews and follow-up analysis, we provide a fine-grained characterization of information extraction workflows using a task-based model, identify related challenges faced by the human-in-the-loop, and propose design guidelines for addressing these challenges. We now discuss a few of the implications of the task model and our proposed design considerations for developing IE tools.

  • Reproducibility: We believe reproducibility should be a first-class requirement in IE tools and recommend use of MLOPs tools and practices. However, MLOps practices are designed to deploy and maintain machine learning models in production reliably and efficiently. Exploring ways to integrate such practices into research environments, which can be highly experimental and more iterative than production environments, is an exciting and challenging problem.
  • Human agency vs. automation: We believe the tension between human agency and automated intelligent agents must be considered when incorporating our proposed design guidelines into practical systems. An in-depth investigation of approaches to reconciling automated reasoning and human agency is required as the design considerations are incorporated into IE systems. To this end, mixed-initiative systems that explore roles of humans and automated agents may inform the design of human-in-the-loop IE tools.

With increasing appreciation of human-centered design of AI systems, we look forward to a future in which deep system support for our proposed design principles will form the basis for human-in-the-loop data science tools.

More detailed findings can be found in our recently published paper at the premier ACM Conference on Human Factors in Computing, presented at CHI ’22.

References:

[1] Patricia Yancey Martin & Barry A. Turner, “Grounded Theory and Organizational Research,” The Journal of Applied Behavioral Science, vol. 22, no. 2 (1986), 141. [2] Jill Gerhardt-Powals. 1996. Cognitive engineering principles for enhancing human-computer performance. International Journal of Human-Computer Interaction 8, 2 (1996), 189–211.

Written by: Sajjadur Rahman and Megagon Labs

Follow us on LinkedIn and Twitter to stay up to date with new research and projects.

Share:

More Blog Posts: