We're scalingsynthetic reasoning

Building the next generation of AI alignment and post-training technologies to create more capable and trustworthy AI systems

Explore Our Research Join Our Team

Selected Research

Research Focus

Selected Pre-prints

Selected Research

Meta Chain-of-Thought

We introduce Meta Chain-of-Thought (Meta-CoT), a novel approach that enhances LLM reasoning through metacognitive prompting strategies. By explicitly modeling the reasoning process and encouraging self-reflection, Meta-CoT achieves significant improvements in complex problem-solving tasks across mathematics, logic, and commonsense reasoning benchmarks.

Learn more

Generative Reward Models

We propose a unified framework that bridges Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF). Our generative reward models can produce detailed explanations for their ratings, enabling better interpretability and more nuanced alignment. This approach shows promise for creating more transparent and controllable AI systems.

Learn more

Direct Principle Feedback

Direct Principle Feedback presents a new paradigm for controlling language model behaviors without extensive retraining. Our method demonstrates how to suppress unwanted outputs like hallucinations and toxic content by providing targeted feedback on underlying principles, achieving superior performance compared to traditional RLHF approaches while requiring significantly less computational resources.

Learn more

Research Focus

Beyond Human Data

Post-training is unlocking new capabilities in foundation models. Training on raw human data doesn't scale, so we are exploring new paradigms for teaching models about the world without direct human supervision for every datapoint.

Alignment at Scale

Current methods of alignment are insufficient and don't scale to superintelligence. Evaluations are even worse. We're building better theoretical foundations and empirical approaches to ensure AI systems remain robustly aligned with human values.

Preserving Intent

Human intent is rich in preferences, nuance, and context, which is often collapsed by uniform models. We research how to build systems that can capture, represent, and act on this rich intent, because AI's ultimate potential hinges on trust.

Selected Pre-prints

CS.ARTIFICIAL INTELLIGENCE

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber

Large reasoning models achieve higher performance on challenging tasks by generating more tokens, but this verbosity wastes computation on easy problems. We introduce Adaptive Length Penalty (ALP), which tailors generation length to per-prompt difficulty and cuts average token usage by ~50% with minimal performance loss.

[ Read PDF ]

CS.COMPUTATION AND LANGUAGE

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman

Why do some models self-improve under RL while others plateau? We identify four cognitive behaviors—verification, backtracking, subgoal setting, and backward chaining—that drive effective self-improvement. Priming models with these behaviors boosts RL gains even when solutions are incorrect.

[ Read PDF ]

CS.MACHINE LEARNING

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

Alex Havrilla et al.

Synthetic data generation with Large Language Models is a promising paradigm for augmenting natural data over a nearly infinite range of tasks. We propose to evaluate algorithms via the makeup of synthetic data in terms of quality, diversity, and complexity. We find quality essential for in-distribution generalization, diversity for out-of-distribution generalization, and complexity beneficial for both.

[ Read PDF ]

View all research

Democratizing AI research is essential, as the future of transformative technologies should not be confined to the corridors of a few profit-driven entities, but open to independent inquiry and understanding for the collective good.

— EleutherAI

Collaborate on open research

Newsletter

Updates on research and releases

Work with us

We're building a world-class team of researchers and engineers working on frontier AI alignment challenges. Our work spans theoretical research, empirical evaluation, and practical implementation of safer AI systems.

Open Positions

Research scientists, engineers, and more

Collaborations

Partner with us on research

Supported by

[ mei ventures ]

[ ashish vaswani ]