Mar 14, 2023 7:57 PM
GPT-4 Will Make ChatGPT Smarter but Won't Fix Its Flaws
With its uncanny ability to hold a conversation, answer questions, and write coherent prose, poetry, and code, the chatbot ChatGPT has forced many people to rethink the potential of artificial intelligence.
The startup that made ChatGPT, OpenAI, today announced a much-anticipated new version of the AI model at its core.
The new algorithm, called GPT-4, follows GPT-3, a groundbreaking text-generation model that OpenAI announced in 2020, which was later adapted to create ChatGPT last year.
The new model scores more highly on a range of tests designed to measure intelligence and knowledge in humans and machines, OpenAI says. It also makes fewer blunders and can respond to images as well as text.
However, GPT-4 suffers from the same problems that have bedeviled ChatGPT and cause some AI experts to be skeptical of its usefulness—including tendencies to “hallucinate” incorrect information, exhibit problematic social biases, and misbehave or assume disturbing personas when given an “adversarial” prompt.
“While they’ve made a lot of progress, it’s clearly not trustworthy,” says Oren Etzioni, a professor emeritus at the University of Washington and the founding CEO of the Allen Institute for AI. “It’s going to be a long time before you want any GPT to run your nuclear power plant.”
OpenAI provided several demos and data from benchmarking tests to show GPT-4’s capabilities. The new model can not only beat the passing score on the Uniform Bar Examination, which is used to qualify lawyers in many US states, but it got a score in the top 10 percent of those of humans.
It also scores more highly than GPT-3 on other exams designed to test knowledge and reasoning, in subjects including biology, art history, and calculus. And it gets better marks than any other AI language model on tests designed by computer scientists to gauge progress in such algorithms. “In some ways it’s more of the same,” Etzioni says. “But it’s more of the same in an absolutely mind-blowing series of advances.”
GPT-4 can also perform neat tricks seen before from GPT-3 and ChatGPT, like summarizing and suggesting edits to pieces of text. It can also do things its predecessors could not, including acting as a Socratic tutor that helps guide students toward correct answers and discussing the contents of photographs. For example, if provided a photo of ingredients on a kitchen counter, GPT-4 can suggest an appropriate recipe. If provided with a chart, it can explain the conclusions that can be drawn from it.
“It definitely seems to have gained some abilities,” says Vincent Conitzer, a professor at CMU who specializes in AI and who has begun experimenting with the new language model. But he says it still makes errors, such as suggesting nonsensical directions or presenting fake mathematical proofs.
ChatGPT caught the public’s attention with a stunning ability to tackle many complex questions and tasks via an easy-to-use conversational interface. The chatbot does not understand the world as humans do and just responds with words it statistically predicts should follow a question.
But that underlying mechanism also means that ChatGPT and systems like it will often make up facts. And despite OpenAI’s efforts to make the model resistant to abuse, it can be prompted into misbehaving, for example by suggesting it role-play doing something it refuses to do when asked directly. OpenAI says GPT-4 is 40 percent more likely to provide “factual responses” and says that GPT-4 is 82 percent less likely to respond to requests that should be disallowed. The company did not say how often the previous version, GPT-3, provides factually incorrect responses or responds to requests it should reject.
Still, Ilya Sutskever, cofounder and chief scientist at OpenAI, claims those as perhaps the most significant advances with the new model. “The thing that stands in the way of ChatGPT being really useful to many people for many tasks is reliability,” he says. “GPT-4 isn't there yet, but it is a lot closer.”
Conitzer at CMU says GPT-4 appears to include new guardrails that prevent it from generating undesirable responses but adds that its new capabilities may lead to new ways of exploiting it.
The arrival of GPT-4 has been long anticipated in tech circles, including with vigorous meme-making about the unreleased software’s potential powers. It arrives at a heady moment for the tech industry, which has been jolted by the arrival of ChatGPT into renewed expectation of a new era of computing powered by AI.
Inspired by the potential of ChatGPT, Microsoftinvested $10 billion in OpenAI this January. The following month it showed off an upgrade of its search engine Bing that uses ChatGPT to collate information and answer complex questions. Last year Microsoft released a coding tool that uses GPT to auto-complete chunks of code for a programmer.
The furor around the chatbot has also stoked interest in new startups building or using similar AI technology and has left some companies feeling flat-footed. Google, which has spent years investing in AI research and which invented some of the key algorithms used to build GPT and ChatGPT, is scrambling to catch up. OpenAI’s research paper on GPT-4 discloses few details of how GPT-4 was built or how it works, citing the competition around these new AI tools as well as the risks they pose.
This week Google announced an API and new developer tools for a text-generating model of its own, called PaLM, which functions similarly to OpenAI’s GPT. Google is also testing a chatbot to compete with ChatGPT called Bard and has said that it will use the underlying technology to improve search.
OpenAI says a version of ChatGPT that uses GPT-4 is available for paid users of the chatbot, and the company will gradually make the new language model available through its API.
The capabilities of ChatGPT and similar AI programs have stirred debate around how AI may automate or revolutionize some office jobs. More advanced iterations may be able to take on new skills. However, Etzioni is keen to emphasize that—impressive though GPT-4 is—there are still countless things that humans take for granted that it cannot do. “We have to remember that, however eloquent ChatGPT is, it's still just a chatbot,” he says.
More Great WIRED Stories
Credit belongs to : www.wired.com