Matching is a crucial task with wide-ranging applications including search, recommendation, and data integration, among others. With the proliferation of social media and e-commerce platforms, matching from structured and unstructured sources has become an increasingly important task. At its very core, the task of matching aims to find all pairs of entries in two collections that share common properties. For example, HR platforms/services match resumes to job descriptions. In online booking platforms/services, the key goal is to match customer preferences to businesses (such as hotels, restaurants, and real-estate entities). Besides these examples of matching entities, there are other use cases where matching techniques are frequently employed. Examples include matching excerpts from customer reviews about a product to customer queries, snippets of a web document to search queries, and user responses in Q/A platforms to new questions. Therefore, as shown in Figure 1, there can be different formulations of matching tasks based on type of input source (structured vs unstructured), downstream application (such as search, conversation, recommendation), and ethical considerations (such as bias and transparency.)
Figure 1. Example design space of models for the matching tasks. Dimensions may include efficiency, continuity, controllability, generalizability, usability, and transparency, among others.
Topics of Interest
In this workshop, we are interested in (but not restricted to) the dimensions of matching (see Figure 1), as well as their resultant combinations. Recent advances in diverse communities from artificial intelligence and databases to computational linguistics and human-computer interaction, have demonstrated promising results in different matching tasks related to the previously mentioned (and many other) domains. We believe that there is tremendous opportunity in bringing communities together to discuss advances in various fields, such as natural language processing, language generation, deep learning, conversational AI, information extraction, data integration, knowledge graphs, and human-centered computing.
Therefore, the goal of this workshop is to bring together the research communities from academia and industries related to these areas. These stakeholders are already interested in the development and the application of novel approaches/models/systems to address challenges around different matching tasks. While the workshop is intended to bring contributions to a wide range of topics, we will now discuss a few example research problems that might be of interest to the workshop audience.
Design Space of Matching Models. While potential workshop submissions can explore the design space of ANY matching model, we next discuss one representative class: large language models. Large language models (LLMs) have garnered significant attention in recent years. Therefore, a natural question is how can we leverage these LLMs for various matching tasks. In fact, recent work has shown how pre-trained transformer-based language models can be employed for entity matching. However, gaps remain in terms of efficiency, controllability, generalizability, usability, and transparency. Potential submissions to the workshop may explore any of these dimensions, besides novel methods for matching. For example, while these LLMs exhibit advanced language understanding capabilities, employing them for matching is not always straightforward. There are many different aspects to be considered, for example:
- Transparency: How can we ensure that these approaches are not biased? Figure 2 highlights how ChatGPT, a conversational agent built on top of OpenAI’s GPT-3 family of large language models, may showcase bias. Employing a matching approach powered by such a model without accounting for the previously mentioned aspects may have catastrophic consequences. We have already seen real-world examples of such occurrences.
Figure 2: A job candidate matching system built using such a model may showcase bias.
- Controllability: How can we provide domain-specific (or world) knowledge (in addition to the language knowledge) to the matching models/approaches? Existing work, such as Roberts et al. and Shuyang et al., show us current LLMs lack specific domain knowledge. Different approaches for injecting knowledge into LLMs have been proposed. But there is no established best approach yet. New approaches, investigations of current approaches, or research related to integrating domain knowledge into matching approaches are all very relevant topics to the workshop.
- Continuity: How do we keep the matching approaches (based on LLMs or not) correct and current such that these models continually reflect the most accurate and recent pieces of knowledge/information? Most of the current approaches that focus on updating the content/knowledge present in LLMs end up facing difficulties related to scalability, catastrophic forgetting, capacity saturation, and other roadblocks. Investigation, analysis, and proposal of matching approaches that contribute to better scalability, continual update, and improved performance are much needed.
What’s In Store at the Workshop
Besides research papers, the matching workshop will feature invited talks and panel discussions where participants will engage with leading researchers from both academia and key industries.
Keynote Speakers. We have already confirmed three invited speakers: William W. Cohen (Principal Scientist at Google), Ndapa Nakashole (Assistant Professor in Computer Science at the University of California, San Diego), and Alan Ritter (Associate Professor in the School of Interactive Computing at Georgia Tech). Keep an eye on the workshop’s social media account for more updates about the talks.
Panel Discussion. In recent years, large language models have become one of the most prominent forces in driving both research and development. However, the impact of these models on downstream tasks is unknown and not well understood. As demonstrated with the example of employing ChatGPT for finding suitable data scientist candidates in Figure 2, the bias embedded within these models can lead to catastrophic consequences. Therefore, we seek to foster a healthy discussion on the topic by inviting researchers from academia and related industries. The topic of the panel discussion will be “Matching in the era of large language models: sorting out the good, the bad, and the ugly.”
We are very excited to organize the first edition of the workshop, and we look forward to exciting submissions on a wide range of topics related to the matching task. You can find more information on our matching workshop website. Please reach out to the organizers at matching-workshop@megagon.ai with any questions or concerns.
Written by: Estevam Hruschka, Sajjadur Rahman, and Megagon Labs
Follow us on LinkedIn and Twitter to stay up to date with us.