In the past few years, large language models (LLMs) have achieved significant success in various tasks because of their extensive parametric capacity coupled with training on large volumes of text which equips them with substantial world knowledge. This parametric knowledge is then utilized to solve downstream tasks effectively. Therefore, it is essential to comprehend and quantify the extent of LLMs’ knowledge about various facts.
Over the last few years, probing techniques have been introduced to assess the knowledge of LLMs. These techniques are mostly defined as fill-in-the-blank tasks that measure the model’s knowledge by ranking its predictions.
However, while these approaches provide a useful binary representation of knowledge by incorporating ranking metrics, there are several fundamental issues with this procedure. Firstly, knowledge is not binary and cannot be fully captured by such a representation. Secondly, ranking metrics are often highly sensitive to the specific prompts used, leading to potential biases in the assessment. As illustrated in the example below, making slight changes to the prompt while maintaining its semantic aspect leads to changes in LLM predictions and consequently the ranking metrics.
Finally, these metrics may not be able to capture knowledge accurately; as highlighted in the following example, the gold label ranking is the same for these two distributions, despite the fact that these two predictions exhibit a completely different level of knowledge regarding the target fact.
Therefore, to gain a more comprehensive understanding of LLMs’ knowledge, it is necessary to develop better metrics that go beyond the binary notion of knowledge and account for these limitations.
Measuring Factual Knowledge
In this work, we propose a new framework that utilizes measurements of knowledge derived from information theory. By examining the probability distribution of language models over the vocabulary when predicting a blank, we first introduce the concept of prompt uncertainty. Then, we use the intuition that an LLM knows a fact if the prompt’s uncertainty remains the same after instilling that fact into the model to introduce our measurements. We incorporate information theory-based metrics such as entropy and KL-divergence, capturing uncertainty to measure knowledge in language models. More specifically, we introduce our entropy-based measurement for a target fact f, as:
H(prompt) measures entropy over probability distribution of the LLM’s prediction for a given prompt. Denoting the probability distribution of the LLM’s prediction over the vocabulary before and after instilling f as P and Q, respectively, we can define our KL-divergence based measurement as:
In this work, to instill a knowledge into a language model, we examine two approaches: (1) Explicit instillation, by directly including the fact in the prompt used in the probing, and (2) Implicit instillation, by fine-tuning the model on that specific fact. One important research question that we aim to address here is: when is it appropriate to instill knowledge explicitly? This is a particularly critical question because the implicit method of instilling information can be very costly and may not be feasible for certain in-context learning-based models such as GPT-3 and GPT-4.
After introducing our knowledge measurements, we conducted three distinct experiments:
Assessing the accuracy of measurements: To assess the accuracy of various metrics in measuring language models’ knowledge, we conducted a synthetic experiment. Since we lack access to the amount of knowledge that language models possess for any given fact, we manually controlled the level of knowledge an LLM has for a given fact. The results reveal that KL-divergence and entropy-based metrics surpass ranking methods by more than 20% and 35% respectively in BERT and T5, highlighting their significantly superior accuracy in approximating LLMs knowledge when compared to ranking-based metrics.
Implicit vs explicit knowledge instillation: We compared the effects of implicit and explicit knowledge instillation on both BERT and T5. In our experiments, we observed a strong correlation between implicit and explicit knowledge instillation. Also, upon identifying the instances where mismatches arise between implicit and explicit instillation, we observed that the majority of them are samples with predicates related to locations or spoken languages for both BERT and T5. These results demonstrate that we can effectively instill knowledge into LLMs explicitly, except for locational and language-based predicates, where implicit knowledge instillation is required.
In-context learning-based applications: We also explored the potential applications of our methods for in-context learning-based models through two distinct tasks: (1) Factual alignment, where we address the question of whether it is necessary to explicitly provide a certain fact in the prompt for it to appear in the generated text. (2) Avoiding hallucination, by calculating the correlation between the LLM’s knowledge and the occurrence of hallucinations versus accurately generated facts.
In this blog, we introduced new metrics to measure factual knowledge in LLMs, addressing the limitations of existing ranking-based methods. Our metrics outperformed traditional ranking-based approaches, providing more accurate assessments of LLMs’ factual knowledge. We also explored the difference between implicit and explicit knowledge instillation in LLMs, highlighting that explicit knowledge instillation alone is insufficient in cases related to location and language queries. Finally, we applied our new metrics to two important areas: factual alignment and hallucination detection for in-context learning-based models, showing promising results in aligning generated outputs with factual knowledge and identifying hallucinated facts.
For a deeper dive into our research read our paper, Measuring and Modifying Factual Knowledge in Large Language Models. You can find Pouya Pezeshkpour, Research Scientist, at ICMLA 2023 where he will be presenting the research Friday, December 15th at 6pm EST.