GPT-4 Will Make ChatGPT Smarter but Won’t Fix Its Flaws

String and push pins forming a speech bubble shape

Photograph: jayk7/Getty Images

Mar 14, 2023 7:57 PM

GPT-4 Will Make ChatGPT Smarter but Won't Fix Its Flaws

A new version of the AI system that powers the popular chatbot has better language skills, but it is still biased and prone to fabrication, and it can be abused.

With its uncanny ability to hold a conversation, answer questions, and write coherent prose, poetry, and code, the chatbot ChatGPT has forced many people to rethink the potential of artificial intelligence.

The startup that made ChatGPT, OpenAI, today announced a much-anticipated new version of the AI model at its core.

The new algorithm, called GPT-4, follows GPT-3, a groundbreaking text-generation model that OpenAI announced in 2020, which was later adapted to create ChatGPT last year.

The new model scores more highly on a range of tests designed to measure intelligence and knowledge in humans and machines, OpenAI says. It also makes fewer blunders and can respond to images as well as text.

However, GPT-4 suffers from the same problems that have bedeviled ChatGPT and cause some AI experts to be skeptical of its usefulness—including tendencies to “hallucinate” incorrect information, exhibit problematic social biases, and misbehave or assume disturbing personas when given an “adversarial” prompt.

“While they’ve made a lot of progress, it’s clearly not trustworthy,” says Oren Etzioni, a professor emeritus at the University of Washington and the founding CEO of the Allen Institute for AI. “It’s going to be a long time before you want any GPT to run your nuclear power plant.”

OpenAI provided several demos and data from benchmarking tests to show GPT-4’s capabilities. The new model can not only beat the passing score on the Uniform Bar Examination, which is used to qualify lawyers in many US states, but it got a score in the top 10 percent of those of humans.

It also scores more highly than GPT-3 on other exams designed to test knowledge and reasoning, in subjects including biology, art history, and calculus. And it gets better marks than any other AI language model on tests designed by computer scientists to gauge progress in such algorithms. “In some ways it’s more of the same,” Etzioni says. “But it’s more of the same in an absolutely mind-blowing series of advances.”

GPT-4 can also perform neat tricks seen before from GPT-3 and ChatGPT, like summarizing and suggesting edits to pieces of text. It can also do things its predecessors could not, including acting as a Socratic tutor that helps guide students toward correct answers and discussing the contents of photographs. For example, if provided a photo of ingredients on a kitchen counter, GPT-4 can suggest an appropriate recipe. If provided with a chart, it can explain the conclusions that can be drawn from it.

“It definitely seems to have gained some abilities,” says Vincent Conitzer, a professor at CMU who specializes in AI and who has begun experimenting with the new language model. But he says it still makes errors, such as suggesting nonsensical directions or presenting fake mathematical proofs.

ChatGPT caught the public’s attention with a stunning ability to tackle many complex questions and tasks via an easy-to-use conversational interface. The chatbot does not understand the world as humans do and just responds with words it statistically predicts should follow a question.

Most Popular

But that underlying mechanism also means that ChatGPT and systems like it will often make up facts. And despite OpenAI’s efforts to make the model resistant to abuse, it can be prompted into misbehaving, for example by suggesting it role-play doing something it refuses to do when asked directly. OpenAI says GPT-4 is 40 percent more likely to provide “factual responses” and says that GPT-4 is 82 percent less likely to respond to requests that should be disallowed. The company did not say how often the previous version, GPT-3, provides factually incorrect responses or responds to requests it should reject.

Still, Ilya Sutskever, cofounder and chief scientist at OpenAI, claims those as perhaps the most significant advances with the new model. “The thing that stands in the way of ChatGPT being really useful to many people for many tasks is reliability,” he says. “GPT-4 isn't there yet, but it is a lot closer.”

Conitzer at CMU says GPT-4 appears to include new guardrails that prevent it from generating undesirable responses but adds that its new capabilities may lead to new ways of exploiting it.

The arrival of GPT-4 has been long anticipated in tech circles, including with vigorous meme-making about the unreleased software’s potential powers. It arrives at a heady moment for the tech industry, which has been jolted by the arrival of ChatGPT into renewed expectation of a new era of computing powered by AI.

Inspired by the potential of ChatGPT, Microsoftinvested $10 billion in OpenAI this January. The following month it showed off an upgrade of its search engine Bing that uses ChatGPT to collate information and answer complex questions. Last year Microsoft released a coding tool that uses GPT to auto-complete chunks of code for a programmer.

The furor around the chatbot has also stoked interest in new startups building or using similar AI technology and has left some companies feeling flat-footed. Google, which has spent years investing in AI research and which invented some of the key algorithms used to build GPT and ChatGPT, is scrambling to catch up. OpenAI’s research paper on GPT-4 discloses few details of how GPT-4 was built or how it works, citing the competition around these new AI tools as well as the risks they pose.

This week Google announced an API and new developer tools for a text-generating model of its own, called PaLM, which functions similarly to OpenAI’s GPT. Google is also testing a chatbot to compete with ChatGPT called Bard and has said that it will use the underlying technology to improve search.

OpenAI says a version of ChatGPT that uses GPT-4 is available for paid users of the chatbot, and the company will gradually make the new language model available through its API.

The capabilities of ChatGPT and similar AI programs have stirred debate around how AI may automate or revolutionize some office jobs. More advanced iterations may be able to take on new skills. However, Etzioni is keen to emphasize that—impressive though GPT-4 is—there are still countless things that humans take for granted that it cannot do. “We have to remember that, however eloquent ChatGPT is, it's still just a chatbot,” he says.

More Great WIRED Stories

Will Knight is a senior writer for WIRED, covering artificial intelligence. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental advances in AI and China’s AI boom. Before that, he was an editor and writer at New Scientist. He studied anthropology and journalism in… Read more
Senior Writer

More from WIRED

Get Ready to Meet the ChatGPT Clones

The technology behind OpenAI’s viral chatbot is set to become widely replicated, unleashing a tidal wave of bots.

Will Knight

China’s Answer to ChatGPT Flubs Its First Lines

Search giant Baidu’s Ernie Bot met online jeers and also faces the challenge of operating on a firewalled internet ruled by government censorship.

Will Knight

Google Rolls Out Its Bard Chatbot to Battle ChatGPT

A new bot has entered the chat. But Google warns that, like its competitor, it will sometimes “hallucinate.”

Will Knight

ChatGPT’s API Is Here. Let the AI Gold Rush Begin

Businesses can now get paid for services built on the large language model, meaning chatbots are going to start appearing everywhere.

Chris Stokel-Walker

Welcome to the Museum of the Future AI Apocalypse

The new Misalignment Museum in San Francisco is a memorial to an imagined future in which artificial general intelligence kills most of humanity.

Khari Johnson

How AI Could Transform Email

Artificial intelligence may streamline a form of business communication that’s already super fake.

Reece Rogers

China's ChatGPT Black Market Is Thriving

A booming illicit market for OpenAI's chatbot shows the huge potential, and risks, for Chinese generative AI.

Caiwei Chen

How to Start an AI Panic

The Center for Humane Technology stoked conversation about the dangers of social media. Now it’s warning that artificial intelligence is as dangerous as nuclear weapons.

Steven Levy

Credit belongs to :

Check Also

Temu Is Losing Millions of Dollars to Send You Cheap Socks

Illustration: Andriy Onufriyenko/Getty Images Tracy Wen Liu Business May 26, 2023 6:00 AM Temu Is …