Introduction
Quick Summary
Evaluation refers to the process of testing your LLM application outputs, and requires the following components:
- Test cases
- Metrics
- Evaluation dataset
Here's a diagram of what an ideal evaluation workflow looks like using deepeval
:

Your test cases will typically be in a single python file, and executing them will be as easy as running deepeval test run
:
deepeval test run test_example.py
note
We understand preparing a comprehensive evaluation dataset can be a challenging task, especially if you're doing it for the first time. Contact us if you want a custom evaluation dataset prepared for you.