Random Image Display on Page Reload

Astra Is Google’s ‘Multimodal’ Answer to the New ChatGPT

May 14, 2024 1:54 PM

Project Astra Is Google's ‘Multimodal’ Answer to the New ChatGPT

Google’s new voice-operated AI assistant, called Project Astra, can make sense of what your phone’s camera sees. It was announced one day after OpenAI revealed a similar vision for ChatGPT.

Demis Hassabis chief executive officer of DeepMind Technologies Ltd.

Demis Hassabis is leading Google's charge to compete with OpenAI in artificial intelligence.Photograph: Jose Sarmento Matos/Getty Images

ChatGPT is not yet two years old, but the idea of communicating with artificial intelligence by typing into a box is already starting to seem quaint.

At Google’s I/O developer conference today, Demis Hassabis, the executive leading the company’s effort to reestablish leadership in AI, introduced a “next-generation AI assistant” called Project Astra. A videoclip showed it running as an app on a smartphone and also a prototype pair of smart glasses. The new concept delivers on a promise Hassabis made about Gemini’s potential when the model was first introduced last December.

In response to spoken commands, Astra was able to make sense of objects and scenes as viewed through the devices’ cameras, and converse about them in natural language. It identified a computer speaker and answered questions about its components, recognized a London neighborhood from the view out of an office window, read and analyzed code from a computer screen, composed a limerick about some pencils, and recalled where a person had left a pair of glasses.

That vision for the future of AI is strikingly similar to one showcased by OpenAI on Monday. OpenAI revealed a new interface for ChatGPT that can converse snappily via voice and talk about what is seen through a smartphone camera or on a computer screen. That version of ChatGPT, powered by a new AI model called GPT-4o, also uses a more humanlike voice and emotionally expressive tone, simulating emotions like surprise and even flirtatiousness.

Google’s Project Astra uses the advanced version of Gemini Ultra, an AI model developed to compete with the one that has powered ChatGPT since March 2023. Gemini—like OpenAI’s GPT-4o—is “multimodal,” meaning it has been trained on audio, images, and video, as well as text, and can natively ingest, remix, and generate data in all those formats. Google and OpenAI moving to that technology represents a new era in generative AI; the breakthroughs that gave the world ChatGPT and its competitors have so far come from AI models that work purely with text and have to be combined with other systems to add image or audio capabilities.

Hassabis said in an interview ahead of today’s event that he thinks text-only chatbots will prove to be just a “transitory stage” on the march toward far more sophisticated—and hopefully useful—AI helpers. “This was always the vision behind Gemini,” Hassabis added. “That's why we made it multimodal.”

The new versions of Gemini and ChatGPT that see, hear, and speak make for impressive demos, but what place they will find in workplaces or personal lives is unclear.

Pulkit Agrawal, an assistant professor at MIT who works on AI and robotics, says Google's and OpenAI’s latest demos are impressive and show how rapidly multimodal AI models have advanced. OpenAI launched GPT-4V, a system capable of parsing images in September 2023. He was impressed that Gemini is able to make sense of live video—for example, correctly interpreting changes made to a diagram on a whiteboard in real time. OpenAI’s new version of ChatGPT appears capable of the same.

Agrawal says the assistants demoed by Google and OpenAI could provide new training data for the companies as users interact with the models in the real world. “But they have to be useful,” he adds. “The big question is what will people use them for—it’s not very clear.”

Google says Project Astra will be made available through a new interface called Gemini Live later this year. Hassabis said that the company is still testing several prototype smart glasses and has yet to make a decision on whether to launch any of them.

Astra’s capabilities might provide Google a chance to reboot a version of its ill-fated Glass smart glasses, although efforts to build hardware suited to generative AI have stumbled so far. Despite OpenAI and Google’s impressive demos, multimodal modals cannot fully understand the physical world and objects within it, placing limitations on what they will be able to do.

“Being able to build a mental model of the physical world around you is absolutely essential to building more humanlike intelligence,” says Brenden Lake, an associate professor at New York University who uses AI to explore human intelligence.

Lake notes that today’s best AI models are still very language-centric because the bulk of their learning comes from text slurped from books and the web. This is fundamentally different from how language is learned by humans, who pick it up while interacting with the physical world. “It’s backwards compared to child development,” he says of the process of creating multimodal models.

Hassabis believes that imbuing AI models with a deeper understanding of the physical world will be key to further progress in AI, and to making systems like Project Astra more robust. Other frontiers of AI, including Google DeepMind’s work on game-playing AI programs could help, he says. Hassabis and others hope such work could be revolutionary for robotics, an area that Google is also investing in.

“A multimodal universal agent assistant is on the sort of track to artificial general intelligence,” Hassabis said in reference to a hoped-for but largely undefined future point where machines can do anything and everything that a human mind can. “This is not AGI or anything, but it's the beginning of something.”

Updated 5-14-2024, 4:15 pm EDT: This article has been updated to clarify the full name of Google's project.

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the Fast Forward newsletter that explores how advances in AI and other emerging technology are set to change our lives—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental… Read more
Senior Writer

Read More

It’s Time to Believe the AI Hype

Some pundits suggest generative AI stopped getting smarter. The explosive demos from OpenAI and Google that started the week show there’s plenty more disruption to come.

Steven Levy

Google DeepMind’s Groundbreaking AI for Protein Structure Can Now Model DNA

Demis Hassabis, Google’s artificial intelligence chief, says the AlphaFold software that revolutionized the study of proteins has received a significant upgrade that will advance drug development.

Will Knight

Prepare to Get Manipulated by Emotionally Expressive Chatbots

The emotional mimicry of OpenAI’s new version of ChatGPT could lead AI assistants in some strange—even dangerous—directions.

Will Knight

ChatGPT Gets a Snappy, Flirty Upgrade With OpenAI’s GPT-4o AI Model

Prepare for ChatGPT to get more emotional. OpenAI demonstrated upgrades that make the chatbot capable of snappier conversations and showed the AI helper picking up on and expressing emotional cues.

Will Knight

Generative AI Doesn’t Make Hardware Less Hard

Wearable AI gadgets from Rabbit and Humane were panned by reviewers, including at WIRED. Their face-plants show that it’s still tough to compete with Big Tech in the age of ChatGPT.

Lauren Goode

With Gemini on Android, Google Points to Mobile Computing’s Future—and Past

Google’s new upgrades to Gemini and Circle to Search offer a look at how the operating system might change and revolve around artificial intelligence.

Julian Chokkattu

Meta’s Open Source Llama 3 Is Already Nipping at OpenAI’s Heels

Meta’s decision to give away powerful AI software for free could threaten the business models of OpenAI and Google.

Will Knight

It’s the End of Google Search As We Know It

Google is rethinking its most iconic and lucrative product by adding new AI features to search. One expert tells WIRED it’s “a change in the world order.”

Lauren Goode

Credit belongs to : www.wired.com

Check Also

New research highlights where ‘The Big One’ earthquake could hit

The research, recently published in the prestigious journal Science Advances, produced the most detailed picture …