ChatGPT Is Reshaping Crowd Work

Jul 7, 2023 8:00 AM

ChatGPT Is Reshaping Crowd Work

Although some workers shun chatbot help, platforms are adopting policies or technology to deter use of AI—potentially making crowd work more difficult.

Glowing picture of a woman on her laptop in the center with a user interface surrounding her. The person on the computer...

We owe our understanding of human behavior thanks, in part, to Bob. He spends hours some days as a subject in academic psychology studies, filling out surveys on crowd-work platforms like Amazon’s Mechanical Turk, where users perform simple digital tasks for small sums of money. The questionnaires often prompt him to recall a time he felt sad, or isolated, or something likewise morose. Sometimes typing his sob stories over and over gets “really really monotonous,” he says. So Bob asks ChatGPT to pour its simulacrum of a heart out instead.

Bob, who used a pseudonym because he fears having his account suspended, says he respects the research he contributes to but doesn’t feel especially conflicted about using an occasional assist from AI. The prompts seem intended to establish a certain mood to set up later questions, and “I can get myself into that mindset,” he says. On top of that, Bob needs to be efficient because he supports himself with crowd work, sometimes filling out 20 surveys in a single day, alongside other microtasks like training chatbots. One 2018 study estimated that crowdworkers make $2 per hour on average, including time spent finding tasks, although Bob makes significantly more.

Students, office workers, coders, and dungeon masters are turning to generative AI tools like ChatGPT to optimize their work in ways that have invited both praise and suspicion. Crowd workers are the latest group to face accusations of using large language models as a shortcut. Some platforms are now adopting policies or technology designed to deter or detect use of large language models like ChatGPT, although some crowd workers and researchers say that care is needed to avoid unfairly burdening workers already facing precarity.

A preprint study from academics at the Swiss Federal Institute of Technology went viral last month after it estimated that more than a third of Mechanical Turkers had used ChatGPT to complete a text summarization task intended to measure human understanding. Its claim that crowd workers widely use large language models inspired some workers and researchers to push back, defending crowd workers’ honor and saying clearer instructions could curb the problem.

CloudResearch, a company that helps researchers recruit online study participants, ran its own version of the study and found that its prescreened workers used ChatGPT only a fifth of the time. Usage nearly disappeared altogether when the company asked people not to use AI, says cofounder and chief research officer Leib Litman.

One crowd worker in her fifties who is active in an online community of Turkers says many wouldn’t dream of cheating. “The people I know have integrity to a fault,” she says. Crowd work can provide a refuge for people who like to arrange work on their own terms, she says, such as introverts or neurodivergent people. “They wouldn’t dream of using ChatGPT to write a summary, because it would be extremely unsatisfying,” says the worker, who herself likes crowd work as a way to avoid age discrimination. Another worker tells WIRED she managed to support herself off of Mechanical Turk when an illness confined her to working from home. She wouldn’t want to risk losing her income due to an account suspension.

Most Popular

While some workers may shun AI, the temptation to use it is very real for others. The field can be “dog-eat-dog,” Bob says, making labor-saving tools attractive. To find the best-paying gigs, crowd workers frequently use scripts that flag lucrative tasks, scour reviews of task requesters, or join better-paying platforms that vet workers and requesters.

CloudResearch began developing an in-house ChatGPT detector last year after its founders saw the technology’s potential to undermine their business. Cofounder and CTO Jonathan Robinson says the tool involves capturing key presses, asking questions that ChatGPT responds to differently to than people, and looping humans in to review freeform text responses.

Others argue that researchers should take it upon themselves to establish trust. Justin Sulik, a cognitive science researcher at the University of Munich who uses CloudResearch to source participants, says that basic decency—fair pay and honest communication—goes a long way. If workers trust that they’ll still glet paid, requesters could simply ask at the end of a survey if the participant used ChatGPT. “I think online workers are blamed unfairly for doing things that office workers and academics might do all the time, which is just making our own workflows more efficient,” Sulik says.

Ali Alkhatib, a social computing researcher, suggests it could be more productive to consider how underpaying crowd workers might incentivize the use of tools like ChatGPT. “Researchers need to create an environment that allows workers to take the time and actually be contemplative,” he says. Alkhatib cites work by Stanford researchers who developed a line of code that tracks how long a microtask takes, so that requesters can calculate how to pay a minimum wage.

Creative study design can also help. When Sulik and his colleagues wanted to measure the contingency illusion, a belief in the causal relationship between unrelated events, they asked participants to move a cartoon mouse around a grid and then guess which rules won them the cheese. Those prone to the illusion chose more hypothetical rules. Part of the design’s intention was to keep things interesting, says Sulik, so that the Bobs of the world wouldn’t zone out. “And no one’s going to train an AI model just to play your specific little game.”

ChatGPT-inspired suspicion could make things more difficult for crowd workers, who must already look out for phishing scams that harvest personal data through bogus tasks and spend unpaid time taking qualification tests. After an uptick in low-quality data in 2018 set off a bot panic on Mechanical Turk, demand increased for surveillance tools to ensure workers were who they claimed to be.

Phelim Bradley, the CEO of Prolific, a UK-based crowd work platform that vets participants and requesters, says his company has begun working on a product to identify ChatGPT users and either educate or remove them. But he has to stay within the bounds of the EU’s General Data Protection Regulation privacy laws. Some detection tools “could be quite invasive if they're not done with the consent of the participants,” he says.

Most Popular

Detectors can also be inaccurate and may become less effective as text generators keep improving. Popular tools like one offered by startup GPTZero often fail to correctly identify AI-written text, and false positives risk punishing honest workers. The Swiss academics behind the recent viral study on crowdworkers and ChatGPT found that an off-the-shelf detector performed poorly and instead built their own system for spotting ChatGPT usage that involved keystroke logging, which they acknowledged “could potentially infringe upon user privacy if not appropriately handled.”

Suspicion or uncertainty about crowd workers turning to AI for help could even cause the amount of crowd work to fall. Veniamin Veselovsky, a researcher who coauthored the Swiss study, says he and others are reconsidering the types of studies they conduct online. “There’s a whole swath of experiments that we can no longer conduct on Mechanical Turk,” he says.

Gabriel Lenz, a political science professor at UC Berkeley who conducts research on the platform, is more sanguine. Like most studies, his include questions designed to catch out participants who aren’t paying attention or who give inconsistent responses to key questions, and he imagines tools to catch large language model users such as watermarking will evolve.

Usually fraud just produces noise that can be filtered out of a study, Lenz says. But if cheaters using AI instead produce data that satisfies what a researcher is looking for, studies may need to be redesigned or conducted offline. Last year researchers discovered that widely circulated claims about Americans’ support for political violence appeared to be wildly overstated, due in part to a survey design that didn’t account for random clicking from bored participants.

The consequences of failing to catch AI-assisted cheating may be significant. Bad data could distort our understanding of the world by getting into published research, or even warp future AI systems, which are often created using data from crowd workers presumed to be accurate. The solution may lie largely in the human realm. “Building trust is a lot simpler than engaging in an AI arms race with more sophisticated algorithms to detect ever more sophisticated AI generated text,” says Sulik.

Get More From WIRED

Caitlin Harrington is a staff writer at WIRED. Before coming to WIRED as a research fellow, Harrington worked as an editorial fellow at San Francisco magazine and as a certified medical dosimetrist in the radiation oncology field. She earned her bachelor’s degree in English from Boston University and lives in… Read more

Staff Writer