Random Image Display on Page Reload

OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents

OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents

To prepare AI agents for office work, the company is asking contractors to upload projects from past jobs, leaving it to them to strip out confidential and personally identifiable information.

Image may contain Furniture Table Computer Electronics Pc Laptop Adult Person Desk Formal Wear and Accessories
Photo-Illustration: WIRED Staff; Getty Images

OpenAI is asking third-party contractors to upload real assignments and tasks from their current or previous workplaces so that it can use the data to evaluate the performance of its next-generation AI models, according to records from OpenAI and the training data company Handshake AI obtained by WIRED.

The project appears to be part of OpenAI’s efforts to establish a human baseline for different tasks that can then be compared with AI models. In September, the company launched a new evaluation process to measure the performance of its AI models against human professionals across a variety of industries. OpenAI says this is a key indicator of its progress towards achieving AGI, or an AI system that outperforms humans at most economically valuable tasks.

“We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks,” reads one confidential document from OpenAI. “Take existing pieces of long-term or complex work (hours or days+) that you’ve done in your occupation and turn each into a task."

OpenAI is asking contractors to describe tasks they’ve done in their current job or in the past and to upload real examples of work they did, according to an OpenAI presentation about the project viewed by WIRED. Each of the examples should be “a concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo,” the presentation notes. OpenAI says people can also share fabricated work examples created to demonstrate how they would realistically respond in specific scenarios.

OpenAI and Handshake AI declined to comment.

Real-world tasks have two components, according to the OpenAI presentation. There’s the task request (what a person’s manager or colleague told them to do) and the task deliverable (the actual work they produced in response to that request). The company emphasizes multiple times in instructions that the examples contractors share should reflect “real, on-the-job work” that the person has “actually done.”

One example in the OpenAI presentation outlines a task from a “Senior Lifestyle Manager at a luxury concierge company for ultra-high-net-worth individuals.” The goal is to “prepare a short, 2-page PDF draft of a 7-day yacht trip overview to the Bahamas for a family who will be traveling there for the first time.” It includes additional details regarding the family’s interests and what the itinerary should look like. The “experienced human deliverable” then shows what the contractor in this case would upload: a real Bahamas itinerary created for a client.

OpenAI instructs the contractors to delete corporate intellectual property and personally identifiable information from the work files they upload. Under a section labeled “Important reminders,” OpenAI tells the workers to “remove or anonymize any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details).”

One of the files viewed by WIRED document mentions a ChatGPT tool called “Superstar Scrubbing” that provides advice on how to delete confidential information.

Evan Brown, an intellectual property lawyer with Neal & McDevitt, tells WIRED that AI labs that receive confidential information from contractors at this scale could be subject to trade secret misappropriation claims. Contractors who offer documents from their previous workplaces to an AI company, even scrubbed, could be at risk of violating their previous employers’ nondisclosure agreements or exposing trade secrets.

“The AI lab is putting a lot of trust in its contractors to decide what is and isn’t confidential,” says Brown. “If they do let something slip through, are the AI labs really taking the time to determine what is and isn’t a trade secret? It seems to me that the AI lab is putting itself at great risk.”

The documents reveal one strategy AI labs are using to prepare their models to excel at real world tasks. Firms like OpenAI, Anthropic, and Google are hiring armies of contractors who can generate high-quality training data in order to develop AI agents capable of automating enterprise work.

AI labs have long relied on third-party contracting firms such as Surge, Mercor, and Scale AI to hire and manage networks of data contractors. In recent years, however, AI labs have required higher-quality data in order to improve their models, forcing them to pay more for skilled talent capable of producing it. That has created a lucrative sub-industry within the AI training world. Handshake said it was valued at $3.5 billion in 2022, while Surge reportedly valued itself at $25 billion in fundraising talks last summer.

OpenAI appears to have explored other ways of sourcing real company data. An individual who helps companies sell assets after they go out of business told WIRED that a representative of OpenAI inquired about obtaining data from these firms, providing that personally identifiable information could be removed. The source, who spoke to WIRED on condition of anonymity because they did not want to sour any business relationships, said the data would have included documents, emails, and other internal communications. The source said they chose not to pursue the idea because they were not confident that personal information could be completely scrubbed.

You Might Also Like

Written by WIRED Staff
Read More

Inside OpenAI’s Raid on Thinking Machines Lab

OpenAI is planning to bring over more researchers from Thinking Machines Lab after nabbing two cofounders, a source familiar with the situation says. Plus, the latest efforts to automate jobs with AI.

Google’s and OpenAI’s Chatbots Can Strip Women in Photos Down to Bikinis

Users of AI image generators are offering each other instructions on how to use the tech to alter pictures of women into realistic, revealing deepfakes.

Two Thinking Machines Lab Cofounders Are Leaving to Rejoin OpenAI

The departures are a blow for Thinking Machines Lab. Two narratives are already emerging about why they happened.

People Are Using AI to Falsely Identify the Federal Agent Who Shot Renee Good

Online detectives are inaccurately claiming to have identified the federal agent who shot and killed a 37-year-old woman in Minnesota based on AI-manipulated images.

AI Devices Are Coming. Will Your Favorite Apps Be Along for the Ride?

Tech companies are calling AI the next platform. But some developers are reluctant to let AI agents stand between them and their users.

Ads Are Coming to ChatGPT. Here’s How They’ll Work

OpenAI says ads will not influence ChatGPT’s responses, and that it won’t sell user data to advertisers.

Tech Workers Are Condemning ICE Even as Their CEOs Stay Quiet

The killing of George Floyd in 2020 prompted a wave of statements from tech companies and CEOs. Today, pushback against ICE is largely coming from employees, not executives.

AI Models Are Starting to Learn by Asking Themselves Questions

An AI model that learns without human input—by posing interesting queries for itself—might point the way to superintelligence.

Grok Is Generating Sexual Content Far More Graphic Than What's on X

A WIRED review of outputs hosted on Grok’s official website shows it’s being used to create violent sexual images and videos, as well as content that includes apparent minors.

Grok Is Being Used to Mock and Strip Women in Hijabs and Saris

A substantial number of AI images generated or edited with Grok are targeting women in religious and cultural clothing.

Elon Musk’s Grok ‘Undressing’ Problem Isn’t Fixed

X has placed more restrictions on Grok’s ability to generate explicit AI images, but tests show that the updates have created a patchwork of limitations that fail to fully address the issue.

Google Gemini Is Taking Control of Humanoid Robots on Auto Factory Floors

Google DeepMind and Boston Dynamics are teaming up to integrate Gemini into a humanoid robot called Atlas.

*****
Credit belongs to : www.wired.com

Check Also

Nick Bostrom Has a Plan for Humanity’s ‘Big Retirement’

Nick Bostrom Has a Plan for Humanity’s ‘Big Retirement’

Steven Levy Business May 8, 2026 11:00 AM Nick Bostrom Has a Plan for Humanity’s …