Google opens access to Gemini, racing to catch up to OpenAI

Davey Alba and Shirin Ghaffary, Bloomberg News

Canadian Heritage Minister on deal with Google and the online news act

VIDEO SIGN OUT

Alphabet Inc.’s Google invented the technology underpinning the current artificial intelligence boom, but its products lag in popularity. The search giant hopes to change that with the much-anticipated release of Gemini, the “largest and most capable AI model” the company has ever built.

Since OpenAI’s runaway success last year with its conversational chatbot ChatGPT, more companies have been experimenting with generative AI, tech that can automate tasks like coding, summarizing reports or creating marketing campaigns, based on what users ask for. At a presentation ahead of the product’s release on Wednesday, Google stressed that Gemini is the most flexible model it’s made because it comes in different sizes, including a version that can run directly on smartphones. That sets the program apart from other competitors.

The AI model, a system used to underpin all kinds of generative AI apps, will have three versions. Those are Gemini Ultra, Gemini Pro and Gemini Nano. Eli Collins, vice-president of product at Google DeepMind, said the variety means Gemini is “able to run on everything from mobile devices to large-scale data centers.”

“For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world — an AI that feels more like a helpful collaborator and less like a smart piece of software,” Collins said on a call with reporters. “Gemini brings us a step closer to that vision.”

Prior to the model’s release, the company ran Gemini through a set of standard industry benchmarks and said that in six out of eight of those tests, Gemini Pro outperformed the GPT-3.5 model from OpenAI. Google said Gemini also outpaced GPT-4, the most recent version of the OpenAI’s general-purpose model, in seven out of eight benchmarks it tested for on general language understanding, reasoning, math and coding. Meanwhile, Google estimated that AlphaCode 2, the company’s latest generative AI product that can explain and generate code, surpassed 85 per cent of rivals in the area of competitive programming. The company is publishing a technical report explaining Gemini’s model architecture, training process and evaluation in more depth.

Starting Wednesday, Android developers who want to build Gemini-powered apps for smartphones and tablets will be able to sign up for the “nano” version of the AI model, which can run directly on such devices. Google also said it’s immediately enabling Gemini on the Pixel 8 Pro, its flagship phone, where it will power new generative AI features like the ability to summarize points from a recorded phone conversation. Next week, Google is making Gemini Pro available for cloud customers via its Vertex AI and AI Studio platforms, the company said.

Gemini Ultra, the largest version of Google’s AI model, will be available first in an early access program for developers and enterprise companies, with details about the program coming next week. It will roll out more broadly to the public early next year.

Gemini will also be able to integrate with Google’s vast suite of apps and services through Bard — the company’s conversational chatbot, and rival to OpenAI’s ChatGPT. Previously, Bard used Google’s PaLM 2 model, a large language model that the company announced at its annual developers conference in May.

For the past year, Google has been under pressure to reinvent its core search business and respond to the rise of artificial intelligence programs that can generate content. Though the company has long been seen as a pioneer in AI research, some have criticized its management for being slow to the market on AI products, especially after the viral successes of products like ChatGPT and the image-generator Dall-E. Since the release of OpenAI’s GPT-4 in March, Google has been scrambling to reassert its leadership in the field, including injecting its maturing search business with the new technology.

Gemini is the company’s answer to that market pressure. Google said the AI model is “natively multimodal,” meaning it was pre-trained from the start to handle both both text- and image-based prompts from users. For example, in a video demonstration, Google showed off how a parent could help with a child’s homework by uploading an image of a math problem along with a photo of attempts to solve it on a worksheet.

“Not only can Gemini solve these problems,” said Taylor Applebaum, a Google software engineer, in the demo, “It can read the answers and understand what was right and what was wrong, and explain the concepts that need more clarification.” The company also said its “search generative experience” — an experimental version of Google’s search engine that uses its generative AI tech — would incorporate Gemini’s new powers by next year.

Still, the company’s representatives cautioned that Gemini remains prone to “hallucinations,” false or made-up information produced by generative AI. Collins called the phenomenon “an unsolved research problem.” The demonstrations the company showed reporters were pre-recorded.

Collins said Gemini “has the most comprehensive safety evaluations of any Google AI model.” In order to evaluate Gemini for safety, he said, Google exposed the AI model to adversarial testing, meaning prompts that mimic a bad actor trying to take advantage of the program. The testing included “real toxicity prompts,” a test developed by the Allen Institute for AI that contains more than 100,000 prompts pulled from the web, and that aims to help AI researchers check large language models for hate speech and political bias.

The company also stressed that the tool will be speedy. Gemini uses a new underlying supercomputer architecture with updated processing chips, which allow it to run faster than earlier, smaller models, the company said. Google is using a new version of its cloud chips, Cloud Tensor Processing Units (TPUs), which were designed in-house and can train existing models 2.8 times faster than predecessors. Amin Vahdat, Google’s vice-president of machine learning, said such an approach gives Google “a new view into the future standard AI infrastructure.” The company still uses third-party AI chips to run its Gemini models, he added.

Gemini will be integrated into Bard, Google’s generative AI chatbot that launched in March, allowing it to tap into the company’s most popular services, including Gmail, Maps, Docs and YouTube. The rollout will occur in two different phases: Starting Wednesday, Bard will be powered by Gemini Pro, which will enable advanced reasoning, planning, understanding and other capabilities. It will be able to operate in English in 170 countries and territories — but notably not in Europe or the UK, where the company said it’s working with local regulators.

Early next year, the company plans to release Bard Advanced, which will be powered by the more capable Gemini Ultra model. Google said it will launch a trusted tester program soon in order to improve Bard Advanced before it launches more broadly to the public.

Sissie Hsiao, Google’s vice president of product for Bard, said that “with Gemini, Bard is getting its biggest and best upgrade yet and it will unlock new ways for people to create, interact and collaborate.”