Data labeling is an essential step in the machine learning life cycle since the quality and quantity of training data directly affect the model performance. Unfortunately, existing annotation tools tend to consider the data labeling step in isolation from the broader ML life cycle, ignoring the iterative workflow of researchers and practitioners.

We present MegAnno, a novel exploratory annotation framework designed for NLP researchers and practitioners. Unlike existing labeling tools that focus on data labeling only, our framework aims to support a broader, iterative ML workflow, including data exploration and model development. With MegAnno’s API, users can programmatically explore the data through sophisticated search and automated suggestion functions and incrementally update labeling schema as their projects evolve. Combined with our widget embedded within notebooks, the users can interactively sort, filter, and assign labels to multiple items simultaneously in the same notebook where the rest of the NLP project resides.