EvalLens logo
Structured output evaluationOpen source on GitHub

Catch schema drift before production.

Evaluate LLM structured outputs, pinpoint failure reasons in seconds, and run the same workflow in hosted or fully private self-hosted mode.

Built for prompt engineers, eval teams, and AI product developers.

Regression evalsExtraction QAClassification auditsAI failure analysisDocker deployable

How it works

01

Upload

Bring your CSV or JSONL with id, prompt, expected, and actual.

02

Evaluate

Score pass rate and classify schema, type, and value failures.

03

Inspect

Filter row-level failures and diagnose regressions quickly.

04

Analyse

Self-hosted: generate an AI narrative of failure patterns and get a recommended next step.

Deploy anywhere

HOSTED

Use instantly

Open the hosted app and start evaluating in seconds with no infrastructure setup.

SELF-HOSTED

Docker deployable in minutes

Run EvalLens in your own environment for private datasets and controlled provider keys.

  • Generate missing actual outputs before evaluating.
  • Trigger AI-powered failure analysis — patterns, affected rows, and a fix recommendation.
  • All four export formats embed run context and the narrative.
docker run -p 3000:3000 -e EVALLENS_MODE=self-hosted evallens

Hosted vs self-hosted

Current mode: Hosted

HOSTED

Bring your completed outputs

  • Use when you already have model outputs.
  • Requires expected and actual in your file.
  • Fastest path for regression checks and release gates.

SELF-HOSTED

Generate, then evaluate in one run

  • Generates missing actual outputs before scoring.
  • Bring your own OpenAI, Anthropic, or Gemini key.
  • After evaluation, trigger AI failure analysis — named patterns, affected row counts, and a recommended next step.
  • All exports (CSV, JSON, MD, PDF) embed run context and the narrative.
  • Deploy with Docker quickly for local or server environments.
  • Deterministic eval workflow for local, staging, or CI.

Your data stays in your environment.

Evaluate your outputs

Upload a CSV or JSONL file with id, prompt, expected, and actual columns.

Drop your file here, or browse

CSV, JSON, or JSONL