Demis Hassabis has never been shy about proclaiming big leaps in artificial intelligence. Most notably, he became famous in 2016 after a bot called AlphaGo taught itself to play the complex and subtle board game Go with superhuman skill and ingenuity.
Today, Hassabis says his team at Google has made a bigger step forward—for him, the company, and hopefully the wider field of AI. Gemini, the AI model announced by Google today, he says, opens up an untrodden path in AI that could lead to major new breakthroughs.
“As a neuroscientist as well as a computer scientist, I’ve wanted for years to try and create a kind of new generation of AI models that are inspired by the way we interact and understand the world, through all our senses,” Hassabis told WIRED ahead of the announcement today. Gemini is “a big step towards that kind of model,” he says. Google describes Gemini as “multimodal” because it can process information in the form of text, audio, images, and video.
An initial version of Gemini will be available through Google’s chatbot Bard from today. The company says the most powerful version of the model, Gemini Ultra, will be released next year and outperforms GPT-4, the model behind ChatGPT, on several common benchmarks. Videos released by Google show Gemini solving tasks that involve complex reasoning, and also examples of the model combining information from text images, audio, and video.
“Until now, most models have sort of approximated multimodality by training separate modules and then stitching them together,” Hassabis says, in what appeared to be a veiled reference to OpenAI’s technology. “That’s OK for some tasks, but you can’t have this sort of deep complex reasoning in multimodal space.”
OpenAI launched an upgrade to ChatGPT in September that gave the chatbot the ability to take images and audio as input in addition to text. OpenAI has not disclosed technical details about how GPT-4 does this or the technical basis of its multimodal capabilities.
Playing Catchup
Google has developed and launched Gemini with striking speed compared to previous AI projects at the company, driven by recent concern about the threat that developments from OpenAI and others could pose to Google’s future.
At the end of 2022, Google was seen as the AI leader among large tech companies, with ranks of AI researchers making major contributions to the field. CEO Sundar Pichai had declared his strategy for the company as being “AI first,” and Google had successfully added AI to many of its products, from search to smartphones.
Source