Skip to main content

Introduction

Quick Summary

Evaluation refers to the process of testing your LLM application outputs, and requires the following components:

  • Test cases
  • Metrics
  • Evaluation dataset

Here's a diagram of what an ideal evaluation workflow looks like using deepeval:

Your test cases will typically be in a single python file, and executing them will be as easy as running deepeval test run:

deepeval test run test_example.py
note

We understand preparing a comprehensive evaluation dataset can be a challenging task, especially if you're doing it for the first time. Contact us if you want a custom evaluation dataset prepared for you.