Existing annotation tools tend to consider the data labeling step in isolation from the broader ML life cycle, ignoring the iterative workflow and needs of researchers and practitioners. 

We present MEGAnno, an open-source data annotation tool that puts the data scientist first, enabling you to bootstrap annotation tasks and manage the continual evolution of annotations through the machine learning lifecycle.  

Our framework aims to support a broader, iterative ML workflow, including data exploration and model development, and can be seamlessly integrated within a Jupyter Notebook. Users can: 

  • programmatically explore the data through sophisticated search,
  • incrementally update labeling schema as their projects evolve, 
  • have a back-end service that acts as a single source of truth and stores/manages the evolution of the annotation workflow,
  • manage, sort, and review the labels, keeping everything organized and lowering the time cost of data annotation.

MEGAnno also harnesses the power of LLMs, allowing users to seamlessly incorporate human- and LLM-generated data labels with verification workflows. This enables LLM agents to label data first, and humans verify a subset of potentially incorrect labels in a selective and exploratory manner.

To learn more about the MEGAnno tool, visit the MEGAnno documentation page.