Hallucination
The hallucination metric determines whether your LLM generates factually correct information by comparing the actual_output to the provided context.
info
If you're looking to evaluate hallucination for a RAG system, please refer to the faithfulness metric instead.
Required Parameters
To use the HallucinationMetric, you'll have to provide the following parameters when creating an LLMTestCase:
inputactual_outputcontext
note
Remember, input and actual_output are mandatory arguments to an LLMTestCase and so are always required even if not used for evaluation.
Example
from deepeval import evaluate
from deepeval.metrics import HallucinationMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual documents that you are passing as input to your LLM.
context=["A man with blond-hair, and a brown shirt drinking out of a public water fountain."]
# Replace this with the actual output from your LLM application
actual_output="A blond drinking water in public."
test_case = LLMTestCase(
input="What was the blond doing?",
actual_output=actual_output,
context=context
)
metric = HallucinationMetric(threshold=0.5)
metric.measure(test_case)
print(metric.score)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
info
This metric uses vectara's hallucination evaluation model.