Hallucination
The hallucination metric determines whether your LLM generates factually correct information by comparing the actual_output
to the provided context
.
info
If you're looking to evaluate hallucination for a RAG system, please refer to the faithfulness metric instead.
Required Parameters
To use the HallucinationMetric
, you'll have to provide the following parameters when creating an LLMTestCase
:
input
actual_output
context
note
Remember, input
and actual_output
are mandatory arguments to an LLMTestCase
and so are always required even if not used for evaluation.
Example
from deepeval import evaluate
from deepeval.metrics import HallucinationMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual documents that you are passing as input to your LLM.
context=["A man with blond-hair, and a brown shirt drinking out of a public water fountain."]
# Replace this with the actual output from your LLM application
actual_output="A blond drinking water in public."
test_case = LLMTestCase(
input="What was the blond doing?",
actual_output=actual_output,
context=context
)
metric = HallucinationMetric(threshold=0.5)
metric.measure(test_case)
print(metric.score)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
info
This metric uses vectara's hallucination evaluation model.