The ACM SIGKDD conference is the premier forum for the advancement, education, and adoption of computer science, specifically for knowledge discovery and data mining. KDD resumed the onsite event in Washington, D.C. in mid-August with 2,090 registered attendees. The conference ran fully in-person: there was no option for online attendance, and all presentations as well as interactions happened only within the meeting rooms. This brought back the ability to network in person.
The conference consisted of two main tracks: the research and applied data science (ADS) track. There were also 33 workshops and 36 tutorials. I attended 1 ADS track session and 2 research track sessions. This year, the research track accepted 254 papers with an acceptance rate of 15.3%; meanwhile, the ADS track accepted 196 papers with an acceptance rate of 26%.
Keynotes and Awards
There were 3 keynote speakers in KDD this year. Professor Lise Getoor from UC Santa Cruz introduced the recent advances in statistical relational learning and its potential applications in the area of data mining. Professor Milind Tambe from Harvard University discussed the potential impacts of applying AI to societal problems such as public health and conservation. Professor Shang-Hua Teng provided an overview of research methodologies for network science in the era of Big Data, including studies of network analysis, game theory, and machine learning.
The paper-related awards for SIGKDD this year were as follows: The best paper award in the research track was the joint work from the University of Virginia and Microsoft, which focused on the topic of modeling causal effects. The runner-up in this category was from the Chinese University of Hong Kong. Their research topic was machine learning theory about optimizers. The best student paper award was given to MIT with the research topic of decision-tree ensembles. All these works focused on theoretical research and provided solid theory and empirical results. The best paper in the ADS track was from Alibaba, which developed a general-purpose platform for federated graph learning. The best paper runner-up in this track came from LinkedIn, with the application of a scalable forecasting system.
There were also some awards given to specific people. This year the winner of the innovation award was Professor Huan Liu from Arizona State University for his significant impact on social media mining. The winner of the service award was Dr. Charu Aggarwal from the IBM Watson Research Center, since he is the organizer and chair of many famous data-mining venues. The winner of the dissertation award was Dr. Rex Ying from Stanford University for his research work in the field of graph neural networks. The winner of the rising star award was Professor Yuxiao Dong for his research on social network analysis and graph neural networks. He was also the honorable mention of the Dissertation Award in 2017.
Next, I would like to introduce some papers published in SIGKDD 2022 that are closely related to the research at Megagon Labs. For each paper, I will give a high-level summarization and provide links to any provided resource.
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs
This paper’s authors studied Knowledge Graph (KG) completion, a well-known problem in the field of knowledge discovery. Unlike previous studies for single-hop completion, the researchers focused on multi-hop reasoning, i.e. retrieving the entities from a KG given a logical query. The authors proposed SMORE, the first general system for single and multi-hop KG completion. The system integrated many popular algorithms for multi-hop KG completion, proposed effective methods for generating training instances in an online manner and made optimizations to improve the scalability. The outcome was a well-functioned system that can be deployed to large-scale real datasets.
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries
This paper’s team tried to combine pre-trained language models with Knowledge Graph (KG) embedding techniques to answer Existential Positive First-Order (EPFO) logical queries over KGs. To this end, the authors first proposed a Transformer-based GNN architecture as the encoder and the Mixture-of-Experts (MoE) strategy to leverage the sparsity nature of the feedforward layers of Transformers. Based on that, the authors also proposed a masked pre-training framework to train it so as to generalize to multiple different applications.
Learning Causal Effects on Hypergraphs
This paper won the best paper award in the research track. It aimed to study the problem of learning casual effect, a very important research field in AI. This paper proposed an innovative idea to formulate this problem as an individual treatment effect (ITE) estimation on hypergraphs. Then the authors proposed the Hypersci framework to model such high-order inference via hyper-graph representation learning techniques.
m-mix: Generating Hard Negatives via Multi-sample Mixing for Contrastive Learning
This paper studied the problem of how to generate high-quality negative samples in the process of contrastive learning. The main technical contribution came via a sampling approach that mixes multiple samples and assigns different mixing weights dynamically. The core problem to resolve is deciding the weights of samples. This was realized by designing a new diversity objective function and mix operations. The proposed method can be widely applied to contrastive learning frameworks for different use cases.
Looper: An End-to-end ML Platform for Product Decisions
This was an ADS track paper from Meta. It introduced Looper, an end-to-end ML platform for decision-making and feedback collection. It is a generic framework that covers the whole life cycle, from training data preparation to model deployment. The authors also introduced the mechanism called “strategy blueprint” to resolve challenges regarding data and configuration management. The framework has been deployed in Meta for a variety of application scenarios.
KDD continues to be an important venue not only for data mining, but also for the broader AI field which are general trends coming to light during KDD. Firstly, the mechanism of separating research and ADS track has proven to be successful. Since data mining is becoming an application-oriented research field, it is important to involve the companies that report the research outcome over real data from the product pipeline into the conference. Secondly, there are many excellent contemporary research outcomes in the research session especially theory related. Specifically, papers that focus on theoretical studies are extremely valuable and appreciated by the research track. Thirdly, the ADS track has been a top choice of many companies. It has attracted many high-quality submissions of famous projects in well-known companies. This could be considered an essential reason why the KDD conference has maintained its impact, recognition, and reputation in computer science.