Data AI Symbiosis

Advances in large language models (LLMs), specifically deep language understanding capabilities, offer new opportunities to tackle classic data-management problems such as data integration, entity matching, and table discovery. Our work in the AI-for-data-management area has recently focused on exploiting language models and state-of-the-art machine learning approaches. We utilize large language models in novel settings for finding table representations to discover datasets in data lakes, data augmentation techniques for data management tasks, and different declarative explanation approaches for data integration tasks.

Conversely, as LLMs are adopted more and more, their application within enterprise systems — where accuracy, privacy, trust, governance, and explainability are of utmost importance — necessitates enhancement in knowledge retrieval spanning heterogeneous data sources, optimization in retrieval (query processing), robustness in fact generation and verification, and flexibility in domain adaptation. For example, the HR domain introduces new problems that require careful consideration related to bias, factuality, and explainability. Our work in the data-management-for-AI area focuses on knowledge grounding and contextualization for knowledge-guided generation, fact-checking and verification, data lake usability, and benchmarking multi-agent systems for enterprise applications, among others.

Recent Publications:

CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems

Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks

Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation

A Blueprint Architecture of Compound AI Systems for Enterprise

Fairness-aware Data Preparation for Entity Matching

Recent Publications:

Related Projects:

Sudowoodo

Ditto

Starmie