This article provides an overview of ACL 2023 with a focus on large language models (LLMs). In this post, we briefly capture the themes of the keynote talks and summarize panel discussions focusing on NLP in the era of LLMs. We also share our experience and insights from the MATCHING workshop, which Estevam Hruschka, Sajjadur Rahman, and Tom Mitchell of Megagon Labs (along with other co-organizers) successfully organized to discuss research on matching of structured and unstructured data using language technologies.
Keynotes
ACL 2023 featured keynote talks by renowned researchers Geoffrey Hinton (University of Toronto/Vector Institute) and Alison Gopnik (UC Berkeley).
Geoffrey Hinton started his talk by delving into the ongoing debate on whether LMs truly understand language or merely reproduce training data. While the latter perspective has been recognized well in the community, recent findings show that LMs, trained for just predicting the next token in a sequence, can perform complex reasoning. Hinton, a pioneer in neural network research, argued that predicting the next word actually necessitates semantic comprehension, and errors made by LMs do not necessarily indicate a lack of understanding (humans make mistakes, too). He also discussed two ways of computing (digital vs biological) and some concerns with AI risks.
The remarkable capabilities exhibited by LLMs have led to a growing tendency to attribute agency to LLMs, for example, “LLMs know something” or “LLMs think of something.” Presenting an alternative viewpoint, Alison Gopnik introduced the concept of cultural technologies — systems that facilitate efficient and effective access to the wealth of information accumulated by humans over time. She shared their analytical findings regarding LLMs, including a very recent study that compares data-driven models to children, to demonstrate how insights into LLMs can be gained from developmental psychology and cognitive science, and vice versa.
Panel Discussions
ACL 2023 organized several LLM-focused panels that occurred throughout the week. There was a plenary LLM panel in the main conference with ACL president Iryna Gurevych (Technical University Darmstadt) and Dan Klein (UC Berkeley), Margaret Mitchell (Hugging Face), Roy Schwartz (The Hebrew University of Jerusalem), and Diyi Yang (Stanford University). The student research workshop also organized a panel on how to conduct NLP research in the era of LLMs to provide guidance to young researchers with panelists from industry professionals (Sarah Hooker from Cohere AI and Swaroop Mishra from Google DeepMind) and academics (Danqi Chen, Princeton University). Finally, the MATCHING workshop organized a panel — with Anhai Doan, Estevam Hurschka, Lei Li, Renée Miller, Barbara Plank, and Niket Tandon as panelists — on the positives and negatives of employing LLMs for tasks related to various forms of alignment between structured and unstructured data. The essence of these panels is that we have entered uncharted territory with cause for optimism. At the same time, this hope should be counterbalanced with responsible design, evaluation, and deployment of LLMs. Let’s now look at a few key themes that emerged from the panel discussions.
Evaluation and the scientific process. Reports on LLM capabilities in press releases and white papers often lack standards and detailed information on evaluation. Currently there is no agreed upon framework or standard for evaluating LLMs, nor is there a uniform approach for reporting their capabilities. While much of the progress with LLMs has been driven by the industry, evaluation is one aspect where the academic research community can take the lead and contribute.
Mind the gap. A recurrent theme of discussion in these panels was the gap in terms of resources that are available to academia and industry. Lack of resources in academia often make it difficult to make meaningful contributions. One suggestion was to drive efforts towards building more capable and robust open-source LLMs. Another recommendation was to collaboratively create and share benchmarks for effective evaluation of LLMs.
Open- vs closed-source LLMs. Panelists also compared open- and closed-source LLMs on dimensions such as transparency, accountability, and accessibility. While open-source LLMs edge out their closed-source counterparts on these dimensions, panelists also highlighted issues such as ease of generating and spreading misinformation, which can be a potential negative aspect of providing open access to LLMs.
Social awareness. While LLMs seem to have captivated the West, specifically English-speaking regions, it’s unclear how impactful these technologies are in low-resource languages. Moreover, the cost and resources required to deploy and use LLMs are prohibitively large for regions such as the global south. Panelists called for a more inclusive approach towards building these LLMs.
Responsible AI. Issues such as bias, trust, and environmental impacts were highlighted by several panelists. Bias is often tied to the model training process, while trust corresponds to modeling the uncertainty of the LLMs’ generation processes. Panelists talked about provenance of training data in the context of open-source LLMs to mitigate these issues. Panelists also urged practitioners to be mindful of the environmental concerns not only during the training phase of LLMs but also at inference time due to the huge popularity of LLM-powered systems such as chatbots.
MATCHING Workshop
Last but not least, researchers from Megagon Labs helped organize the First Workshop on Matching from Structured and Unstructured Data. Besides the LLM-focused panel described earlier, the workshop featured four invited talks by leading researchers from industry and academia on themes such as knowledge base construction from question answers (William Cohen, Google Research), retrieval augmentation of LLMs (Ndapa Nakashole, UC San Diego), post-editing of models (Sameer Singh, UC Irvine), and effective knowledge distillation strategies (Alan Ritter, Georgia Tech.) The workshop was held in a hybrid format with in-person and remote participants as well as presenters who showcased their research work on various topics related to matching from entity alignment and relation extraction to knowledge representation and verification. More details can be found here.
Conclusion
NLP stands as one of today’s most exciting research domains. We hope this article has illuminated recent breakthroughs and innovations. For learning more on NLP, machine learning, and database insights, follow us on LinkedIn, Twitter, or Facebook.
Written by: Estevam Hruschka, Chunpeng Ma, Naoki Otani, Sajjadur Rahman, and Megagon Labs