Answer Relevancy
The answer relevancy metric measures the quality of your RAG pipeline's generator by evaluating how relevant the actual_output
of your LLM application is compared to the provided input
. deepeval
's answer relevancy metric is a self-explaining LLM-Eval, meaning it outputs a reason for its metric score.
Required Parameters
To use the AnswerRelevancyMetric
, you'll have to provide the following parameters when creating an LLMTestCase
:
input
actual_output
retrieval_context
Example
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."
# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]
metric = AnswerRelevancyMetric(
threshold=0.7,
model="gpt-4",
include_reason=True
)
test_case = LLMTestCase(
input="What if these shoes don't fit?",
actual_output=actual_output,
retrieval_context=retrieval_context
)
metric.measure(test_case)
print(metric.score)
print(metric.reason)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
You can also choose to fallback to Ragas' answer relevancy metric (which has a similar implemention). This however is not capable of generating a reason.
from deepeval.metrics import RAGASAnswerRelevancyMetric