Persona Evaluation

persona-bench: An Evaluation Harness for Personalization & Reproducible Pluralistic Alignment

Human vs AI Personalization Challenge

You'll be competing against a frontier AI model in crafting personalized responses.

Disclaimer: Some questions may touch on sensitive topics. Please engage thoughtfully and respectfully. If you feel uncomfortable with any question, feel free to skip it.

Current Language Model's Ability to Successfully Personalize for a Known Demographic Varies Widely

Models

Method

Chart

Group

Sort

Metric

Want to see how your model performs?

Did you prompt?

Evaluate your model's chat style across 1,000+ personas

Did you tune?

Fine-tuning can break your model's ability to personalize for specific sub-demographics...

Developer? Bulk evals

Connect to our held-out evaluation

Tune Evaluation Tool

Evaluate your Tune performance.

API Usage

Current Token Count: 0 / 1,000,000

Sign Up to Use the Evaluation Tool

Create an account to access the full features of our Evaluation Tool.

New users receive complimentary credits to get started!