Random Image Display on Page Reload

Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up

Mar 19, 2024 4:25 PM

Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up

A research paper quietly released by Apple describes an AI model called MM1 that can answer questions and analyze images. It’s the biggest sign yet that Apple is developing generative AI capabilities.

The Apple logo on the exterior of an Apple store building with a yellow overlay effect

Photograph: Jeremy Moeller/Getty Images

While the tech industry went gaga for generative artificial intelligence, one giant has held back: Apple. The company has yet to introduce so much as an AI-generated emoji, and according to a New York Timesreport today and earlier reporting from Bloomberg, it is in preliminary talks with Google about adding the search company’s Gemini AI model to iPhones.

Yet a research paper quietly posted online last Friday by Apple engineers suggests that the company is making significant new investments into AI that are already bearing fruit. It details the development of a new generative AI model called MM1 capable of working with text and images. The researchers show it answering questions about photos and displaying the kind of general knowledge skills shown by chatbots like ChatGPT. The model’s name is not explained but could stand for MultiModal 1.

MM1 appears to be similar in design and sophistication to a variety of recent AI models from other tech giants, including Meta’s open source Llama 2 and Google’s Gemini. Work by Apple’s rivals and academics shows that models of this type can be used to power capable chatbots or build “agents” that can solve tasks by writing code and taking actions such as using computer interfaces or websites. That suggests MM1 could yet find its way into Apple’s products.

“The fact that they’re doing this, it shows they have the ability to understand how to train and how to build these models,” says Ruslan Salakhutdinov, a professor at Carnegie Mellon who led AI research at Apple several years ago. “It requires a certain amount of expertise.”

MM1 is a multimodal large language model, or MLLM, meaning it is trained on images as well as text. This allows the model to respond to text prompts and also answer complex questions about particular images.

One example in the Apple research paper shows what happened when MM1 was provided with a photo of a sun-dappled restaurant table with a couple of beer bottles and also an image of the menu. When asked how much someone would expect to pay for “all the beer on the table,” the model correctly reads off the correct price and tallies up the cost.

When ChatGPT launched in November 2022, it could only ingest and generate text, but more recently its creator OpenAI and others have worked to expand the underlying large language model technology to work with other kinds of data. When Google launched Gemini (the model that now powers its answer to ChatGPT) last December, the company touted its multimodal nature as beginning an important new direction in AI. “After the rise of LLMs, MLLMs are emerging as the next frontier in foundation models,” Apple’s paper says.

MM1 is a relatively small model as measured by its number of “parameters,” or the internal variables that get adjusted as a model is trained. Kate Saenko, a professor at Boston University who specializes in computer vision and machine learning, says this could make it easier for Apple’s engineers to experiment with different training methods and refinements before scaling up when they hit on something promising.

Saenko says the MM1 paper provides a surprising amount of detail on how the model was trained for a corporate publication. For instance, the engineers behind MM1 describe tricks for improving the performance of the model including increasing the resolution of images and mixing text and image data. Apple is famed for its secrecy, but it has previously shown unusual openness about AI research as it has sought to lure the talent needed to compete in the crucial technology.

Saenko says it’s hard to draw too many conclusions about Apple’s plans from the research paper. Multimodal models have proven adaptable to many different use cases. But she suggests that MM1 could perhaps be a step toward building “some type of multimodal assistant that can describe photos, documents, or charts and answer questions about them.”

Apple’s flagship product, the iPhone, already has an AI assistant—Siri. The rise of ChatGPT and its rivals has quickly made the once revolutionary helper look increasingly limited and out-dated. Amazon and Google have said they are integrating LLM technology into their own assistants, Alexa and Google Assistant. Google allows users of Android phones to replace the Assistant with Gemini.

Reports from TheNew York Times and Bloomberg that Apple may add Google’s Gemini to iPhones suggest Apple is considering expanding the strategy it has used for search on mobile devices to generative AI. Rather than develop web search technology in-house, the iPhone maker leans on Google, which reportedly pays more than $18 billion to make its search engine the iPhone default. Apple has also shown it can build its own alternatives to outside services, even when it starts from behind. Google Maps used to be the default on iPhones but in 2012 Apple replaced it with its own maps app.

Apple CEO Tim Cook has promised investors that the company will reveal more of its generative AI plans this year. The company faces pressure to keep up with rival smartphone makers, including Samsung and Google, that have introduced a raft of generative AI tools for their devices.

Apple could end up tapping both Google and its own, in-house AI, perhaps by introducing Gemini as a replacement for conventional Google Search while also building new generative AI tools on top of MM1 and other homegrown models. Last September, several of the researchers behind MM1 published details of MGIE, a tool that uses generative AI to manipulate images based on a text prompt.

Salakhutdinov believes his former employer may focus on developing LLMs that can be installed and run securely on Apple devices. That would fit with the company’s past emphasis on using “on-device” algorithms to safeguard sensitive data and avoid sharing it with other companies. A number of recent AI researchpapers from Apple concern machine-learning methods designed to preserve user privacy. “I think that's probably what Apple is going to do,” he says.

When it comes to tailoring generative AI to devices, Salakhutdinov says, Apple may yet turn out to have a distinct advantage because of its control over the entire software-hardware stack. The company has included a custom “neural engine” in the chips that power its mobile devices since 2017, with the debut of the iPhone X. “Apple is definitely working in that space, and I think at some point they will be in the front, because they have phones, the distribution.”

In a thread on X, Apple researcher Brandon McKinzie, lead author of the MM1 paper wrote: “This is just the beginning. The team is already hard at work on the next generation of models.”

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the Fast Forward newsletter that explores how advances in AI and other emerging technology are set to change our lives—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental… Read more
Senior Writer

More from WIRED

The Quest to Give AI Chatbots a Hand—and an Arm

Robotics startup Covariant is experimenting with a ChatGPT-style chatbot that can control a robotic arm, as a way to create machines that can be more helpful in the physical world.

Will Knight

8 Google Employees Invented Modern AI. Here’s the Inside Story

They met by chance, got hooked on an idea, and wrote the “Transformers” paper—the most consequential tech breakthrough in recent history.

Steven Levy

Selective Forgetting Can Help AI Learn Better

Erasing key information during training allows machine learning models to learn new languages faster and more easily.

Amos Zeeberg

Google DeepMind’s Latest AI Agent Learned to Play Goat Simulator 3

These AI agents can adapt to games they haven't played before. Google made them by feeding data on how humans play different video games to a language model like those behind the latest chatbots.

Will Knight

Large Language Models’ Emergent Abilities Are a Mirage

A new study suggests that sudden jumps in LLMs’ abilities are neither surprising nor unpredictable, but are actually the consequence of how we measure ability in AI.

Stephen Ornes

Perplexity's Founder Was Inspired by Sundar Pichai. Now They’re Competing to Reinvent Search

Aravind Srinivas grew up in the same city as Google’s CEO and developed an obsession with the company long before launching his own AI search startup.

Lauren Goode

Forget Chatbots. AI Agents Are the Future

Startups and tech giants are trying to move from chatbots that offer help via text, to AI agents that can get stuff done. Recent demos include an AI coder called Devin and agents that play videogames.

Will Knight

Why Elon Musk Had to Open Source Grok, His Answer to ChatGPT

Earlier this month Elon Musk sued OpenAI for keeping its technology secret. Today he promised to give away his own “truth-seeking” chatbot Grok for free.

Will Knight

*****
Credit belongs to : www.wired.com

Check Also

TikTok’s Creator Economy Stares Into the Abyss

Louise Matsakis Business Apr 24, 2024 7:00 AM TikTok’s Creator Economy Stares Into the Abyss …