Even with all the recent advances in artificial intelligence (AI) and large language models (LLMs), there are still many open problems and challenges, especially in real-world applications in specific domains. In the human resources (HR) domain, for example, applications need to provide fairness, factuality, controllability, consistency, interpretability, and reasoning capabilities, but current state-of-the-art technologies tend to lack those capabilities.
We believe that we can exploit knowledge coming from many different sources (structured and unstructured) to address some of the limitations in current LLMs. At Megagon Labs, we are working on symbiotic models and systems (Figure 1) that take advantage of LLMs as well as structured (knowledge bases [KBs], knowledge graphs [KGs], databases [DBs], etc.) and unstructured (texts) information in a continuous and (semi-) automated machine-learning paradigm. In this post, we will describe Megagon KnowledgeHub and how internally we leverage the knowledge hub in our research and development.
Figure 1. KnowledgeHub incorporates several complementary sources such as LLMs, textual information, knowledge bases, etc. which enables symbiotic knowledge representations and models. Applications exploiting the KnowledgeHub benefit from the continuous improvements in fairness, factuality, controllability, consistency, interpretability, and reasoning capabilities.
Megagon KnowledgeHub
Megagon KnowledgeHub is a repository that contains different types of knowledge/information in the HR domain. In addition to large language models, databases, taxonomies, knowledge bases, and knowledge graphs, the KnowledgeHub leverages different types of textual data, such as resumes, job descriptions, job requirements and responsibilities, company profiles, skills, etc. (See Figure 2.) The KnowledgeHub can learn and store knowledge in different representations, including symbolic (as property graphs, for example) as well as dense numerical representations (such as embeddings or models) and hybrid mixtures of those.
Figure 2. Megagon KnowledgeHub leverages different sources of information to build models for a variety of tasks such as candidate-to-job matching.
We believe the hybrid and symbiotic learning approach, in KnowledgeHub, can deliver artifacts that capture context effectively, provide controllable, consistent and interpretable outputs, and help to create fair decision-making. Because it combines symbolic and non-symbolic signals, the KnowledgeHub can overcome limitations present in current approaches based only on LLMs. For example, it can provide dense vector representations (embeddings) or enhanced large language models and knowledge graphs that can be used to solve many HR tasks such job-resume matching, skills extraction (from resumes or job descriptions), career path recommending, taxonomy and ontology curation, and many others with more precise, controllable, and explainable results.
KnowledgeHub allows us to have continuously evolving, large-scale shared knowledge useful to Recruit Holdings and its subsidiaries. Projects can contribute to the hub by publishing their models, explanations, and embeddings, which in turn can be used in other projects to leverage new knowledge in the hub and explore new modeling ideas. Because it is a shared knowledge resource, as it grows and evolves with new data and models, improvements carry over on the fly to applications deployed in production.
Consider for example a new model M1 focused on finding a resume that best matches a given job description. Initially, this model (M1) can benefit from KnowledgeHub as a “consumer.” For example, M1 can use existing embeddings (or knowledge-enhanced models) in the hub to better solve the matching task. But M1 can also contribute to the hub by publishing back the knowledge gained from solving matching tasks. For example, it can enrich and expand the hub with new knowledge on “skills” and “job requirements” (and other concepts) that are extracted from job descriptions and resumes. The “skill/job requirement” pairs (coming from the matching task performed) can be used to inject relations such as “compatible skills and requirements.” Consequently, M1 is both a “consumer” of and “provider” to KnowledgeHub. In this process, the more that models (and other components) contribute to and consume from the hub, the better the hub can improve its own content and learning ability.
The architecture of KnowledgeHub can be examined in layers (Figure 3). The database layer is responsible for physically storing all different pieces of knowledge. We use a combination of No-SQL (graph, key-value, document) databases and relational ones. The key idea is to be able to store different representations and different types of knowledge in an integrated manner such that symbolic and numerical representations of entities can be linked. Additionally, we can combine symbolic queries with approximate queries of numerical representations. The query layer is responsible for allowing the manipulation and management of the content in the hub. This includes querying, ingesting, and deleting different pieces of the knowledge and also creating different views and snapshots.
The feature layer is responsible for producing different data that can be useful in different downstream tasks and applications. These features include subgraphs, embeddings, model checkpoints, etc. that can be generated for specific tasks/domains as well as for general use. Any generated feature becomes available for downstream applications.
The reasoning layer provides access to different reasoning models and approaches that allow applications to perform inference over the knowledge present in the hub. The newly inferred knowledge can be used in specific downstream tasks, or it can be used to help extend and curate the knowledge hub.
Every piece of knowledge that is stored in the hub has an associated provenance and confidence score. The provenance layer is responsible for keeping track of the provenance of the content in the knowledge hub, making it possible to perform version control and to explain the sources that are responsible for each piece of knowledge.
The taxonomy/KB layer is responsible for providing taxonomy as a service and can be called by applications to provide up-to-date taxonomic information relevant to the domain at hand. And finally, the extraction layer is responsible for providing models and tools that allow for the continuous (semi-) automatic knowledge ingestion.
Figure 3. KnowledgeHub layers. We can visualize knowledge in terms of seven layers.
In this article, we provided a high-level overview of Megagon Labs KnowledgeHub. It allows our research and development teams to work and produce results to continuously improve the impact of our work in the research community.
Written by Estevam Hruschka, Eser Kandogan, and Megagon Labs.
Follow us on LinkedIn and Twitter to stay up to date with us.