Rotom: A multi-purposed data augmentation framework for training high-quality machine learning models

Deep Learning is revolutionizing almost all fields of computer science including computer vision, natural language processing, and data management. However, the success of deep neural nets heavily depends on the existence of large, high-quality labeled training datasets. To this end, Data Augmentation (DA) has become a common practice in machine learning for generating additional training examples from existing ones via data transformation.

We proposed Rotom, a multi-purposed data augmentation framework for training high-quality machine learning models while requiring only a small number (e.g., 200) of labeled examples. Rotom has a simple task formulation of sequence classification so that it covers a wide range of data management and NLP tasks including entity matching, error detection in data cleaning, text classification, and more. Rotom leverages (1) pre-trained Seq2Seq models to generate diverse yet natural augmented sequences and (2) meta-learning for training effective policy models for combining sequences generated by multiple DA operators.

Demo

Please try out the Rotom demo.

Snippext: An Opinion Mining Pipeline that Uses Less Training Data

Snippext is a state-of-the-art (SOTA) opinion mining pipeline that extracts aspects, opinions, and sentiments from user-generated content such as online reviews. It allows for a reduction of 50% or more of the training data usually required.

ExtremeReader: An Interactive Explorer for Customizable and Explainable Review Summarization

ExtremeReader generates both a structured and abstractive summarization that are easier to interpret. It also allows users to explore and see explanations of these summaries by drilling down or up to the desired level of granularity. Users can even see the sentence from which the opinion features were extracted.

HappyDB: a happiness database of 100,000 happy moments

We built HappyDB, a crowd-sourced collection of 100,000 happy moments that we make publicly available. Our goal is to build NLP technology that understands how people express their happiness in text while achieving insights into happiness-leading events and scenarios on a scale.

OpineDB and Voyageur: How Subjective Databases and Experiential Search Can Improve Customer Experiences

We developed OpineDB a subjective database system that addresses these challenges by interpreting subjective predicates against a database schema through a combination of natural language processing (NLP) and information retrieval (IR) techniques.

Archive