Random Image Display on Page Reload

Chatbots Are Entering Their Stone Age

May 30, 2024 12:00 PM

Chatbots Are Entering Their Stone Age

Anthropic and other big AI startups are teaching chatbots “tool use,” to make them more useful in the workplace.

Photograph of hand tools flying through the air with a yellow and blue overlay effect

Photo-illustration: WIRED Staff; Dual Dual/Getty Images

For all the bluster about generative artificial intelligence upending the world, the technology has yet to meaningfully transform white-collar work. Workers are dabbling with chatbots for tasks such as drafting emails, and companies are launching countless experiments, but office work hasn’t undergone a major AI reboot.

Perhaps that’s only because we haven’t given chatbots like Google’s Gemini and OpenAI’s ChatGPT the right tools for the job yet; they’re generally restricted to taking in and spitting out text via a chat interface. Things might get more interesting in business settings as AI companies start deploying so-called “AI agents,” which can take action by operating other software on a computer or via the internet.

Anthropic, a competitor to OpenAI, announced a major new product today that attempts to prove the thesis that tool use is needed for AI’s next leap in usefulness. The startup is allowing developers to direct its chatbot Claude to access outside services and software in order to perform more useful tasks. Claude can, for instance, use a calculator to solve the kinds of math problems that vex large language models; be required to access a database containing customer information; or be compelled to make use of other programs on a user’s computer when it would help.

I’ve written before about how important AI agents that can take action may prove to be, both for the drive to make AI more useful and the quest to create more intelligent machines. Claude’s tool use is a small step toward the goal of developing these more useful AI helpers being launched into the world right now.

Anthropic has been working with several companies to help them build Claude-based helpers for their workers. Online tutoring company Study Fetch, for instance, has developed a way for Claude to use different features of its platform to modify the user interface and syllabus content a student is shown.

Other companies are also entering the AI Stone Age. Google demonstrated a handful of prototype AI agents at its I/O developer conference earlier this month, among many other new AI doodads. One of the agents was designed to handle online shopping returns, by hunting for the receipt in a person’s Gmail account, filling out the return form, and scheduling a package pickup.

Google has yet to launch its return-bot for use by the masses, and other companies are also moving cautiously. This is probably in part because getting AI agents to behave is tricky. LLMs do not always correctly identify what they are being asked to achieve, and can make incorrect guesses that break the chain of steps needed to successfully complete a task.

Restricting early AI agents to a particular task or role in a company’s workflow may prove a canny way to make the technology useful. Just as physical robots are typically deployed in carefully controlled environments that minimize the chances they will mess up, keeping AI agents on a tight leash could reduce the potential for mishaps.

Even those early use cases could prove extremely lucrative. Some big companies already automate common office tasks through what’s known as robotic process automation, or RPA. It often involves recording human workers’ onscreen actions and breaking them into steps that can be repeated by software. AI agents built on the broad capabilities of LLMs could allow a lot more work to be automated. IDC, an analyst firm, says that the RPA market is already worth a tidy $29 billion, but expects an infusion of AI to more than double that to around $65 billion by 2027.

Adept AI, a company cofounded by David Luan, formerly VP of engineering at OpenAI, has been honing AI agents for office work for more than a year. Adept is cagey about who it works with and what its agents do, but the strategy is clear.

“Our agents are already in the 90s [percent] for reliability for our enterprise customers,” Luan says. “The way we did that was to limit the scope of deployment a bit. All the new research we do is to improve reliability for new use cases that we don't yet do well on."

A key part of Adept’s plan is to train its AI agents to be better at understanding the goal at hand and the steps required to achieve it. The company hopes that will make the technology flexible enough to help out in all kinds of workplaces. “They need to understand the reward of the actual task at hand,” Luan says. “Not just have the ability to copy existing human behavior.”

The core capabilities needed to make AI agents more useful are also necessary to advance on the grander vision of making machine intelligence more powerful. Right now, the ability to make plans to achieve specific goals is a hallmark of natural intelligence that is notably lacking in LLMs.

It may be an extremely long time before machines attain humanlike intelligence, but the concept of tool use being crucial is evocative given the evolutionary path of Homo sapiens. In the natural world, prehuman hominids began handling crude stone tools for tasks such as cutting animal hides. The fossil record shows how increasingly sophisticated tool use blossomed alongside advancing intelligence, as humans’ dexterity, bipedalism, vision, and brain size progressed. Maybe now it’s time for one of humankind’s most sophisticated tools to develop tool use of its own.

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the Fast Forward newsletter that explores how advances in AI and other emerging technology are set to change our lives—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental… Read more
Senior Writer

Read More

Prepare to Get Manipulated by Emotionally Expressive Chatbots

The emotional mimicry of OpenAI’s new version of ChatGPT could lead AI assistants in some strange—even dangerous—directions.

Will Knight

OpenAI’s Chief AI Wizard, Ilya Sutskever, Is Leaving the Company

One day after OpenAI showed off an all-new, emotional version of ChatGPT, the company announced that chief scientist Ilya Sutskever is leaving the company. He had voted in November to fire CEO Sam Altman.

Reece Rogers

Generative AI Doesn’t Make Hardware Less Hard

Wearable AI gadgets from Rabbit and Humane were panned by reviewers, including at WIRED. Their face-plants show that it’s still tough to compete with Big Tech in the age of ChatGPT.

Lauren Goode

It’s Time to Believe the AI Hype

Some pundits suggest generative AI stopped getting smarter. The explosive demos from OpenAI and Google that started the week show there’s plenty more disruption to come.

Steven Levy

Inside the Cult of the Haskell Programmer

It’s spooky. It’s esoteric. It’s also the key to understanding the rise and relevance of functional programming.

Sheon Han

Deadspin’s New Owners Are Embracing Betting Content—but Not AI

The future of once-beloved sports blog Deadspin has been murky since it was acquired by an unknown company called Lineup Publishing in March. The new ownership told WIRED they “don't want to ruin it.”

Kate Knibbs

Pocket-Sized AI Models Could Unlock a New Era of Computing

Research at Microsoft shows it’s possible to make AI models small enough to run on phones or laptops without major compromises to their smarts. The technique could open up new use cases for AI.

Will Knight

US Sues to Break Up Ticketmaster and Live Nation, Alleging Monopoly Abuse

The US government has filed an antitrust suit against Ticketmaster and its parent company, Live Nation, for allegedly abusing their dominance in the ticketing market to extinguish healthy competition.

Joel Khalili

*****
Credit belongs to : www.wired.com

Check Also

Danish Media Threatens to Sue OpenAI

Morgan Meaker Business Jun 11, 2024 2:00 AM Danish Media Threatens to Sue OpenAI Danish …