Case Study: Mercor + Sixtyfour

The Problem

To train GPT-5, OpenAI needed GPT-4 to be wrong.

This is how model improvement works: you need examples where the current model fails but a human expert succeeds. These failure cases become the training data that teaches the next model what it doesn't know. But GPT-4 is already good enough that most people can't consistently generate questions it cannot answer.

When an AI lab needs these rare failure cases in dermatology, investment banking, or competitive programming, they turn to Mercor. Mercor has to find the small number of people whose expertise exceeds what the models have learned from the internet's corpus.

What Mercor Does with Sixtyfour

Before Sixtyfour, Mercor's sourcing process could take weeks per search, often returning candidates who looked qualified on paper but couldn't actually stump the models. Sixtyfour changed this.

Sixtyfour's platform starts with targeted data sources. For dermatologists who might know of rare conditions that current state of the art models don't, Sixtyfour pulls from medical association directories. For algorithmic problem experts, Sixtyfour examines competitive programming leaderboards. For complex financial structures the model might not grasp, Sixtyfour identifies people who went from collegiate consulting clubs to senior positions at investment banks.

A name from these sources means nothing by itself. So Sixtyfour's enrichment agents recursively explore everything they can find. They start with an initial data point, read through every linked page, conduct additional searches based on what they discover, then read through those results, continuously branching out. From a single name and affiliation, the agents might traverse academic publications, find co-authors, explore their work, discover conference presentations, identify specialized forums they participate in, and build a complete picture of expertise that no single source contains.

This recursive scraping methodology means that even when requirements become extremely narrow (say, the labs need not just a dermatologist, but one specializing in rare genetic skin conditions affecting fewer than a thousand people worldwide), Sixtyfour can still deliver. The agents keep searching, reading, and connecting dots until they find the three people on Earth who fit the criteria.

Mercor now create qualified candidate lists in hours rather than weeks. More importantly, these candidates actually possess the capability Mercor needs: they can consistently generate problems the AI cannot solve.

Results

The process works. Mercor successfully delivers experts to AI labs who can generate questions and problems that current state of the art models cannot solve. These become part of the training regime for the next generation of models.

For Mercor, this has made an extremely difficult sourcing challenge both manageable and fast. The foundational model companies need these experts to push their models forward. Without them, the models plateau. Mercor can reliably deliver these experts because Sixtyfour's recursive methodology can identify and qualify the right kind of expertise from millions of potential candidates, regardless of how specific the search parameters become.

As models improve, fewer humans will be able to provide useful training signals. The pool shrinks with each generation. But Sixtyfour's approach ensures that as long as such experts exist, Mercor will find them.

The Problem

To train GPT-5, OpenAI needed GPT-4 to be wrong.

What Mercor Does with Sixtyfour

Before Sixtyfour, Mercor's sourcing process could take weeks per search, often returning candidates who looked qualified on paper but couldn't actually stump the models. Sixtyfour changed this.

Results

Case Study: How Mercor Uses Sixtyfour to Find AI Training Experts

Case Study: Mercor + Sixtyfour

The Problem

What Mercor Does with Sixtyfour

Results

Case Study: How Mercor Uses Sixtyfour to Find AI Training Experts

Case Study: Mercor + Sixtyfour

The Problem

What Mercor Does with Sixtyfour

Results