Random Image Display on Page Reload

Google Assistant Finally Gets a Generative AI Glow-Up

Oct 4, 2023 10:30 AM

Google Assistant Finally Gets a Generative AI Glow-Up

Google is adding AI capabilities from its chatbot Bard to the humble Google Assistant, allowing the virtual helper to make sense of images and draw on data in documents and emails.

Sissie Hsiao sepaking on stage in front of a screen that reads Assistant with Bard

Courtesy of Google

Google went big when it launched its generative AI fight-back against OpenAI's ChatGPT in May. The company added AI text-generation to its signature search engine, showed off an AI-customized version of the Android operating system, and offered up its own chatbot, Bard. But one Google product didn’t get a generative AI infusion: Google Assistant, the company’s answer to Siri and Alexa.

Today, at its Pixel hardware event in New York, Google Assistant at last got its upgrade for the ChatGPT era. Sissie Hsiao, Google’s vice president and general manager for Google Assistant, revealed a new version of the AI helper that is a mashup of Google Assistant and Bard.

Hsiao says Google envisions this new, “multimodal” assistant to be a tool that goes beyond just voice queries, including by also making sense of images. It can handle “big tasks and small tasks from your to-do list, everything from planning a new trip to summarizing your inbox to writing a fun social media caption for a picture,” she said in an interview with WIRED earlier this week.

Courtesy of Google

The new generative AI experience is so early in its rollout that Hsiao said it didn’t even qualify as an “app” yet. When asked for more information about how it might appear on someone’s phone, company representatives were generally unclear on what final form it might take. (Did Google rush out the announcement to coincide with its hardware event? Quite possibly.)

Whatever container it appears in, the Bard-ified Google Assistant will use generative AI to process text, voice, or image queries, and respond accordingly in either text or voice. It’s limited to approved users for an unknown period of time, will run on mobile only, not smart speakers, and will require users to opt in. On Android, it may operate as either a full-screen app or as an overlay, similar to how Google Assistant runs today. On iOS, it will likely live within one of Google's apps.

The Google Assistant’s generative glow-up comes on the heels of Amazon’s Alexa getting more conversational and OpenAI’s ChatGPT also going multimodal, becoming able to respond using a synthetic voice and describe the content of images shared with the app. One capability apparently unique to Google’s upgraded assistant is an ability to converse about the webpage a user is visiting on their phone.

For Google in particular, the introduction of generative AI to its virtual assistant raises questions around how quickly the search giant will start using large language models across more of its products. That could fundamentally change how some of them work—and how Google monetizes them.

Most Popular

Gain of Function

Google has spent the past several years touting the capabilities of its Google Assistant, which was first introduced to smartphones in 2016, and the past several months touting the capabilities of Bard, which the company has positioned as a kind of chatty, AI-powered collaborator. So what does combining them—within the existing Assistant app—actually do?

Hsiao said the move combines the Assistant’s personalized help with the reasoning and generative capabilities of Bard. One example: Because of the way Bard now works within Google’s productivity apps, it can help find and summarize emails and answer questions about work documents. Those same functions would now theoretically be accessed through Google Assistant—you could request information about your docs or emails using voice and have those summaries read aloud to you.

Its new connection with Bard also gives the Google Assistant new powers to make sense of images. Google already has an image recognition tool, Google Lens, that can be accessed through the Google Assistant or the all-encompassing Google app. But if you capture a photo of a painting or a pair of sneakers and feed it to Lens, Lens will either identify the painting or try to sell you the sneakers—by showing links to buy them—and leave it at that.

The Bard-ified version of Assistant, on the other hand, will understand the content of the photo you’ve shared with it, Hsiao claims. In the future that could allow deep integration with other Google products. “Say you’re scrolling through Instagram and you see a picture of a beautiful hotel. You should be able to one-button press, open Assistant, and ask, ‘Show me more information about this hotel, and tell me if it’s available on my birthday weekend,’” she said. “And it should be able to not only figure out which hotel it is, but actually go check Google Hotels for availability.”

A similar workflow could make the new Google Assistant into a powerful shopping tool if it could connect products in images with online stores. Hsiao said Google hasn’t yet integrated commercial product listings into Bard results but didn’t deny that might be coming in the future.

“If users really want that, if they’re looking to buy things through Bard, that’s something we can look into,” she said. “We need to look at how people want to shop with Bard and really explore that and build that into the product.” (Although Hsiao framed this as something users might want, it could also provide new opportunities for Google’s ad business.)

Most Popular

Proceed With Caution

When Google first announced Assistant in 2016, AI’s language skills were a lot less advanced. The complexity and ambiguity of language made it impossible for computers to respond usefully to more than simple commands, and even those it sometimes fumbled.

The emergence of large language models over the past few years—powerful machine learning models trained on oodles of text from books, the web, and other sources—has brought about a revolution in AI’s ability to handle written and spoken language. The same advances that allow ChatGPT to respond impressively to handle complex queries make it possible for voice assistants to engage in more natural dialogs.

David Ferrucci, CEO of AI company Elemental Cognition and previously the lead on IBM’s Watson project, says language models have removed a great deal of the complexity from building useful assistants. Parsing complex commands previously required a huge amount of hand-coding to cover the different variations of language, and the final systems were often annoyingly brittle and prone to failure. “Large language models give you a huge lift,” he says.

Ferrucci says, however, that because language models are not well suited to providing precise and reliable information, making a voice assistant truly useful will still require a lot of careful engineering.

More capable and lifelike voice assistants could perhaps have subtle effects on users. The huge popularity of ChatGPT has been accompanied by confusion over the nature of the technology behind it as well as its limits.

Motahhare Eslami, an assistant professor at Carnegie Mellon University who studies users’ interactions with AI helpers, says large language models may alter the way people perceive their devices. The striking confidence exhibited by chatbots such as ChatGPT causes people to trust them more than they should, she says.

People may also be more likely to anthropomorphize a fluent agent that has a voice, Eslami says, which could further muddy their understanding of what the technology can and can’t do. It is also important to ensure that all of the algorithms used do not propagate harmful biases around race, which can happen in subtle ways with voice assistants. “I’m a fan of the technology, but it comes with limitations and challenges,” Eslami says.

Most Popular

Tom Gruber, who cofounded Siri, the startup that Apple acquired in 2010 for its voice assistant technology of the same name, expects large language models to produce significant leaps in voice assistants’ capabilities in coming years but says they may also introduce new flaws.

“The biggest risk—and the biggest opportunity—is personalization based on personal data,” Gruber says. An assistant with access to a user’s emails, Slack messages, voice calls, web browsing, and other data could potentially help recall useful information or unearth valuable insights, especially if a user can engage in a natural back-and-forth conversation. But this kind of personalization would also create a potentially vulnerable new repository of sensitive private data.

“It’s inevitable that we’re going to build a personal assistant that will be your personal memory, that can track everything you've experienced and augment your cognition,” Gruber says. “Apple and Google are the two trusted platforms, and they could do this but they have to make some pretty strong guarantees.”

Hsiao says her team is certainly thinking about ways to advance Assistant further with help from Bard and generative AI. This could include using personal information, such as the conversations in a user’s Gmail, to make responses to queries more individualized. Another possibility is for Assistant to take on tasks on behalf of a user, like making a restaurant reservation or booking a flight.

Hsiao stresses, however, that work on such features has yet to begin. She says it will take a while for a virtual assistant to be ready to perform complex tasks on a user’s behalf and wield their credit card. “Maybe in a certain number of years, this technology has become so advanced and so trustworthy that yes, people will be willing to do that, but we would have to test and learn our way forward,” she says.

Get More From WIRED

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the Fast Forward newsletter that explores how advances in AI and other emerging technology are set to change our lives—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental… Read more
Senior Writer

Lauren Goode is a senior writer at WIRED covering consumer tech issues. She focuses on the intersection of new technologies and humanity, often through experiential or investigative personal essays. Her coverage areas include communications apps, trends in commerce, AR and VR, subscription services, data and device ownership, and how Silicon… Read more
Senior Writer

More from WIRED

Amazon’s All-Powerful ‘Buy Box’ Is at the Heart of Its New Antitrust Troubles

The US Federal Trade Commission filed a long-anticipated antitrust complaint alleging that Amazon uses its power over sellers to keep ecommerce prices artificially high.

Caitlin Harrington

Amazon Upgrades Alexa for the ChatGPT Era

A sweeping upgrade to Amazon’s Alexa taps AI technology like that behind ChatGPT and also allows the virtual assistant to attempt to read body language.

Will Knight

ChatGPT Can Now Talk to You—and Look Into Your Life

ChatGPT inches closer to feature parity with the seductive AI assistant from Her, thanks to an upgrade that adds voice and image recognition to the chatbot.

Lauren Goode

Men Overran a Job Fair for Women in Tech

The Grace Hopper Celebration is meant to unite women in tech. This year droves of men came looking for jobs.

Amanda Hoover

TikTok Shop Has a Snail Slime Problem

TikTok Shop, which launched in the US last week, is littered with impossibly cheap—and fake—products. Snail slime is just the beginning.

Amanda Hoover

Get Ready for AI Chatbots That Do Your Boring Chores

Move over, Siri. Startups are using the technology behind ChatGPT to build more capable AI agents that can control your computer and access the web to get things done—with sometimes chaotic results.

Will Knight

The Meta AI Chatbot Is Mark Zuckerberg's Answer to ChatGPT

Meta's AI assistant can do things like suggest travel plans in a group chat. The company also announced a string of chatbots modeled on celebrities like Snoop Dogg and Paris Hilton.

Khari Johnson

AI Algorithms Are Biased Against Skin With Yellow Hues

Google, Meta, and others test their algorithms for bias using standardized skin tone scales. Sony says those tools ignore the yellow and red hues at work in human skin color.

Paresh Dave

Credit belongs to : www.wired.com

Check Also

Some People Actually Kind of Love Deepfakes

AI fakes are a disinformation menace. But some politicians, executives, and academics see them as …