01
Upload
Bring your CSV or JSONL file with id, prompt, expected, and actual columns.
EvalLens helps you upload datasets, score model responses, and inspect row-level failures with a fast, focused workflow.
01
Bring your CSV or JSONL file with id, prompt, expected, and actual columns.
02
Run automatic checks to measure pass rate and classify failure reasons.
03
Filter and drill into row-level results to diagnose what broke and why.
HOSTED
Hosted mode evaluates files that already include both expected and actual values. It is fast and ideal when your generation pipeline already exists.
SELF-HOSTED
Self-hosted mode can generate missing actual outputs using your configured provider keys, then evaluate the results. Run it yourself to keep data private and fully under your control.
Best for teams that need local data boundaries, environment-based provider control, or reproducible evals in CI. You can bring your own keys for OpenAI, Anthropic, or Gemini and switch models without changing your dataset format.
1. Clone the repo from GitHub and run locally or in Docker.
2. Set EVALLENS_MODE=self-hosted and at least one provider API key.
3. Upload your dataset, generate missing outputs, and inspect failures row by row.
Upload a CSV or JSONL file with id, prompt, expected, and actual columns.
Drop your file here, or browse
CSV, JSON, or JSONL