Introduction

Quick Summary

Evaluation refers to the process of testing your LLM application outputs, and requires the following components:

Test cases
Metrics
Evaluation dataset

Here's a diagram of what an ideal evaluation workflow looks like using deepeval:

Your test cases will typically be in a single python file, and executing them will be as easy as running deepeval test run:

deepeval test run test_example.py

note

We understand preparing a comprehensive evaluation dataset can be a challenging task, especially if you're doing it for the first time. Contact us if you want a custom evaluation dataset prepared for you.

Quick Summary​

Quick Summary