We’re doing cutting edge research for transparent, auditable AI alignment

  • Current methods of “alignment” are insufficient;
    evaluations are even worse.
  • Human intent reflects a rich tapestry of preferences, collapsed by uniform models.
  • AI`s potential hinges on trust, from interpretable data to every layer built upon it.
  • Informed decisions around risk are not binary.
  • Training on raw human data doesn’t scale.
  • Your models should adapt and scale, automatically.

Solving the most pressing problems in AI

EleutherAI said it best

Democratizing AI research is essential, as the future of transformative technologies should not be confined to the corridors of a few profit-driven entities, but open to independent inquiry and understanding for the collective good.

Let's collaborate on open science ML research →
  • Fully auditable, robust AGI alignment platform
  • Pre-training scale automated dataset curation and augmentation
  • Collaborate with top research schools & global community
  • Build scalable supervision for agentic workflows
  • Work on dynamic and continual RLAIF
  • Make multi-modal agents safer

It's time to build

Computer Science > Machine Learning
Supressing Pink Elephants with Direct Principle Feedback

Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, and Stella Biderman

Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model. However, in many cases, it is desirable for LLMs to be controllable at inference time, so that they can be used in multiple contexts with diverse needs. We illustrate this with the Pink Elephant Problem: instructing an LLM to avoid discussing a certain entity (a “Pink Elephant”), and instead discuss a preferred entity (“Grey Elephant”).

Submitted on 12 Feb 2024

Supported By

Microsoft’s M12 Ventures
Eric Schmidt's First Spark Ventures

[ mei ventures ]

[ ashish vaswani ]