Wrapping up 2023: Acknowledgements & Aspirations

We’d like to reflect on our collaborations and accomplishments as we wrap up the year. We put a lot of effort into our research papers and blog articles, and we’d like to thank you, the readers, for engaging with us. We also thank our guest speakers who have come into our office, virtually or in person, to share their work and dedication with us. And for our workshop participants, invited speakers, and panelists, we greatly appreciate your support and participation. We value each of your contributions to our community.

Guest Speakers

We’d like to thank our guest speakers for joining us in fostering creativity and inspiration. 

  • Ricardo Baeza-Yates, Director of Research at the Institute for Experiential AI of Northeastern University. Topic: Responsible AI

  • Arash TermehchyAssociate Professor at the School of EECS at Oregon State University.  Topic: Exploratory Interaction: When Humans Learn to Train

  • Shashank Srivastava, Assistant Professor of Computer Science at UNC-Chapel Hill.
    Topic: Few-Shot Learning with Interactive Language
  • Peter Clark, Senior Research Director at AI2.
    Topic: Knowledge and Reasoning in the Age of GPT
  • Bodhisattwa Prasad Majumder, Research Scientist at AI2. 
    Topic: CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Previous slide
Next slide

Matching Workshop at ACL 2023

The Matching Workshop helped us bring together research communities from academia and industry to discuss the development and application of natural language-based approaches/models/systems to address different matching tasks. 

As a research lab, an important goal of our team is to contribute to the academic and industrial research community. Thus, putting efforts into fostering collaboration and networking; encouraging active participation of researchers and practitioners of different levels of expertise and seniority; providing learning opportunities and disseminating knowledge will continue to be an important goal for us. Following along these lines, we plan on continuing to develop the Matching Workshop and collaborating with all those working on matching tasks using unstructured and structured data in all domains. 

We had a great turnout, and we’d like to thank all those who helped us make it a great event, including those who submitted papers as well as those who attended our workshop virtually and in person. 

To learn more about the Matching Workshop, read the article!


Invited Speakers

  • Alan Ritter, School of Interactive Computing & College of Computing, Georgia Tech
  • Ndapa Nakashole, Department of Computer Science and Engineering, University of California, San Diego
  • Sameer Singh, Department of Computer Science, University of California, Irvine
  • William W. Cohen, Google Research


  • AnHai Doan, Department of Computer Science, University of Wisconsin
  • Barbara Plank, Ludwig-Maximilians-Universität München & IT University of Copenhagen
  • Lei Li, Department of Computer Science, UC Santa Barbara
  • Niket Tandon, Allen Institute for AI (AI2)
  • Renée Miller, Khoury College of Computer Sciences, Northeastern University

To further catalyze the transfer and growth of ideas in NLP, we’ve also developed a new workshop. As a company that focuses on the human resource domain, we got together with leaders in the fields of both HR and NLP to organize the NLP4HR workshop. The NLP4HR workshop will be hosted by EACL 2024 in St. Julian’s, Malta. We look forward to continuing our collaboration with the research community. To learn more about using NLP for HR, read the article here.

Previous slide
Next slide


Our researchers have diligently brought new research to the forefront of the research community, particularly in the fields of machine learning, data management, and natural language processing. We are proud that this year, our papers were published in top conferences and workshops in the fields of natural language processing, machine learning, data management, and human-computer interaction.

Here is a list of our published works in 2023:

AAAI Tutorial – Never-Ending Learning, Lifelong Learning and Continual Learning: Systems, Models, Current Challenges and Applications by Estevam Hruschka

ICDE – Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation by Runhui Wang, Yuliang Li, Jin Wang

CHI – Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks by Frederick Choi, Hannah Kim, Sajjadur Rahman, Dan Zhang

WWW  Weedle: Composable Dashboard for Data-Centric NLP in Computational Notebooks by Nahyun Kwon, Hannah Kim, Sajjadur Rahman, Dan Zhang, Estevam Hruschka

WWW Tutorial – Never-Ending Learning, Lifelong Learning and Continual Learning in the Era of Large Pre-Trained Language Models by Estevam Hruschka

SIGMOD Tutorial – Table Discovery in Data Lakes: State-of-the-art and Future Directions by Grace Fan, Jin Wang, Yuliang Li, Renée J. Miller

Poster in Generative AI and Law (GenLaw) Workshop at ICML – The Extractive-Abstractive Axis: Measuring Content ‘Borrowing’ in Generative Language Models by Nedelina Teneva

VLDB – Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning by Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, and Renée Miller

IJCNLP-AACL – Zero-shot Triplet Extraction by Template Infilling by Bosung Kim, Hayate Iso, Nikita Bhutani, Estevam Hruschka, Ndapa Nakashole, Tom Mitchell

ICMLA – Measuring and Modifying Factual Knowledge in Large Language Models by Pouya Pezeshkpour

New Frontiers in Graph Learning Workshop at NeurIPS    Knowledge Graphs are not Created Equal: Exploring the Properties and Structure of Real KGs by Nedelina Teneva, Estevam Hruschka


Some of these aforementioned publications and articles were supported by our interns. We can’t close out the year without thanking the 2023 interns for their hard work. Likewise, we have to thank the interns’ mentors for being a part of the development of young professionals as they make their way through their Ph.D. programs. This partnership between students and professionals makes an impact on both parties, helping to develop soft and hard skills essential for growing in the field.

Read about our internships and mentorship style: 

Megagon Summer 2023 Intern Experience

The Intern Experience at Megagon Labs: What You Can Expect

Previous slide
Next slide

Academic Involvement

We take our support for those in academia by supporting the conferences and workshops that help publish and teach students and professionals. This year our staff supported over 20 conference, workshops and academic events in the capacity of reviewers, mentors, co-chairs, and program committee members.

  • CHI
  • VLDB
  • TKDE
  • AAAI
  • ICLR
  • NeurIPS
  • CoNLL
  • IEEE BigData
  • ACL
  • CIKM
  • VIS
  • PacificVis
  • ARR
  • ICDE
  • SOCC
  • EDBT

Industry Trends and Insights

2023 continued to be a trailblazing year for large language models (LLMs); they became bigger and better, cheaper and yet also more efficient (with plugins that connect your data and services to models), and now tout impressive multi-modal capabilities! We have seen governments across the world express interest in the technology and pursue regulations to help achieve the best outcomes. The space is growing and expanding quickly, but as with any lab that does research in this space, developments impact our own course, too.

As we recently finished our planning for next year, we are increasingly focusing on how we can put LLMs into production and leverage them in research and product development, specifically within the human resources (HR) space. We plan to concentrate on exploiting LLMs with proprietary data and services in end-user-facing use cases. “Data-AI” integration and “Human-AI” collaboration are going to be key research themes for our lab. For example, we are keen to tackle the augmentation of LLMs with structured, semi-structured, and graph data sources. This focus also includes distillation and retrieval, as well as how to plan and reason about complex tasks, and how to find and query data sources in an enterprise setting. Given in production settings, there is a large variety of both tasks and data sources; as such, how to use them and integrate them becomes critical. We also plan to look into multi-agent orchestration frameworks to integrate all the above key components for production use. As for the end-user-facing use cases, the problems we are tackling are explaining, rationalizing, annotating, verifying, and fact-checking output, not only from LLMs generation but also in multi-agents scenarios. 

Stay tuned for cool demos, interesting papers, and open-source projects on these topics from us!

Written by: Megagon Labs

Follow us on LinkedIn and Twitter to stay up to date.


More Blog Posts: