Open Source

We believe in the power of collaborative innovation. Our open-source initiatives underscore our commitment to advancing AI for the benefit of all.
Blue

An open-source framework for building and deploying agentic workflows in enterprise environments. Unlike conventional AI frameworks, Blue is designed with enterprise-scale requirements in mind—scalability, observability, configurability, and seamless integration with existing infrastructure. In designing Blue, we followed a “systems perspective,” where we envisioned an architecture for an AI system—a compound AI architecture, not just a library—composed of a mix of LLM-based and “deterministic” agents and other key architectural components, such as data and agent registries and planners.

MEGAnno combines the power of large language models (LLMs) with human expertise to streamline and enhance the data labeling process with a data annotation framework. MEGAnno’s capabilities include performing LLM annotations, conducting confidence-aided human verification, iteratively selecting models and refining prompts, and comparing and aggregating results from different LLM agents. MEGAnno includes a back-end service that acts as a single source of truth and stores/manages all the evolution of the annotation information through the lifecycle.

Leam combines the strengths of spreadsheets, computational notebooks, and interactive visualizations to facilitate integrated text analytics. Leam implements a visual text algebra to facilitate extensible and expressive analysis, supporting diverse tasks ranging from data cleaning to visualization.

Magneton

Magneton framework bridges gaps in existing computation notebook widgets — with respect to transparency, reusability, and customizability — by introducing a built-in interaction history tracker, a state-manager to maintain widget state history, and an action wrapper to enable on-demand customization of operations defined by widget developers.

 

Watchog employs contrastive learning, an important technique in the self-supervised learning paradigm, to automatically learn table representations from a vast collection of unlabeled table corpus in a fully unsupervised manner. Watchog outperforms previous approaches by a significant margin when there are insufficient labeled training instances. This shows the effect of our contrastive learning-based solution that could bring extra information without relying on human annotation. Published at SIGMOD 2024. 

 

Starmie

The source code for the VLDB 2023 paper that proposed self-supervised learning techniques to train a column encoder for table union search and other data discovery tasks.

6 Min Read
February 5, 2025
With the MCRank benchmark and our EXSIR method, we’ve shown that LLMs can significantly improve their performance on these challenging tasks when guided by structured reasoning.
6 Min Read
December 16, 2024
We echo through this blog that the optimization framework for compound AI systems should achieve broader goals such as multi-objective (accuracy, cost, latency, etc.), multi-plan optimization and also handling constraints, especially the budget. Again, these optimization goals are not comprehensive by far but are important for enterprise scenarios.
4 Min Read
June 3, 2024
By enabling robust and accurate column annotation, this innovative framework holds the potential to revolutionize data-driven decision-making processes across a multitude of industries. Watchog could empower businesses to extract valuable insights from product catalogs, pricing tables, and customer data repositories and use it to optimize their pricing strategies, and deliver personalized recommendations to enhance customer satisfaction and loyalty.