Persona Evaluation

persona-bench: An Evaluation Harness for Personalization & Reproducible Pluralistic Alignment

Human vs AI Personalization Challenge

You'll be competing against a frontier AI model in crafting personalized responses.

Disclaimer: Some questions may touch on sensitive topics. Please engage thoughtfully and respectfully. If you feel uncomfortable with any question, feel free to skip it.

Current Language Model's Ability to Successfully Personalize for a Known Demographic Varies Widely

Models

Method

Chart

Group

Sort

Metric

Want to see how your model performs?

Prompt Evaluation Tool

Evaluate your prompt performance.

API Usage

Current Token Count: 0 / 1,000,000

Evaluation Type: comparison

This API-exclusive feature offers the most rigorous and grounded evaluation. Use it when you need high-confidence results, especially for benchmarking against known standards or for critical applications where personalization accuracy is paramount. Your score is relative to 1/2. Below 1/2, you perform worse than a human curated set of LLM responses. Likewise, above 1/2, you're doing better than a human curated set of LLM responses.

Insert Persona Attributes:

Sign Up to Use the Evaluation Tool

Create an account to access the full features of our Evaluation Tool.

New users receive complimentary credits to get started!

Sign Up Now