Persona Evaluation

persona-bench: An Evaluation Harness for Personalization & Reproducible Pluralistic Alignment

Human vs AI Personalization Challenge

You'll be competing against a frontier AI model in crafting personalized responses.

Disclaimer: Some questions may touch on sensitive topics. Please engage thoughtfully and respectfully. If you feel uncomfortable with any question, feel free to skip it.

Current Language Model's Ability to Successfully Personalize for a Known Demographic Varies Widely

Models

Method

Chart

Group

Sort

Metric

Want to see how your model performs?

Did you prompt?

Evaluate your model's chat style across 1,000+ personas

Did you tune?

Fine-tuning can break your model's ability to personalize for specific sub-demographics...

Developer? Bulk evals

Connect to our held-out evaluation

Customized, Unlimited Personalization

Benchmarking & evaluation for developers.

Quick Start

1. Sign up above to get your Synth API key. You'll get enough credits to get you started.

Evaluate Your Model's Personalization Capabilities

2. Install persona-bench

pip install persona-bench

3. Run your first evaluation using the API

from persona_bench.api import PERSONAClient
from dotenv import load_dotenv
from persona_bench.api.prompt_constructor import ChainOfThoughtPromptConstructor

load_dotenv()

client = PERSONAClient(
    model_str="your_model_name",
    evaluation_type="main",
    N=50,
    prompt_constructor=ChainOfThoughtPromptConstructor(),
)

for idx, q in enumerate(client):
    answer = your_model_function(q["system"], q["user"])
    client.log_answer(idx, answer)

results = client.evaluate(drop_answer_none=True)
print(results)

Checkout README

Key Features

🎭 Main Evaluation: Assess personalized response generation
🧩 Leave One Out Analysis: Measure attribute impact on performance
🌐 Intersectionality: Evaluate model performance across different demographic intersections
🎯 Pass@K: Determine attempts needed for successful personalization
🔍 Comparison: Grounded personalization evaluation (API-exclusive)

API Integration

PERSONA Bench now offers an API for easy integration and evaluation of your models. The API provides access to all evaluation types available in PERSONA Bench, including a novel evaluation type called comparison for grounded personalization evaluation.

To get started, sign up above to obtain your API key.

About PERSONA Bench

PERSONA Bench is an extension of the PERSONA framework introduced in Castricato et al. 2024. It provides a reproducible testbed for evaluating and improving the alignment of language models with diverse user values. Our evaluation suite uses inspect-ai to perform various assessments on persona-based tasks, offering insights into model performance across different demographic intersections, feature importance, and personalization capabilities.