Starmie: Table Discovery in Data Lakes – Exploring State-of-the-Art Success and Future Directions

Contrastive self supervised training from data lake offline to query table online.

In this work, we propose an end-to-end framework named Starmie. Dataset discovery from data lakes is a critical way to utilize open-domain data within the enterprise. To overcome the issues stemming from data quality and incomplete metadata in data lakes, it is essential to support the problem of table union search, which aims to find all tables that are unionable with the query table, given a query table and a collection of data lake tables.