December 13, 2023
Google’s CEO Sundar Pichai is excited about the start of a new era in artificial intelligence (AI) called the Gemini era. Gemini is Google’s latest big language model. Pichai talked about it in June at the I/O developer conference, and now it’s available for everyone to use. According to Pichai and Google DeepMind CEO Demis Hassabis, Gemini is a significant improvement in AI that will impact almost all of Google’s products. Pichai says “One of the powerful things about this moment, is you can work on one underlying technology and make it better and it immediately flows across our products.”
Gemini is the outcome of extensive teamwork involving various Google teams, including those at Google Research. It was constructed from scratch to be multimodal, enabling it to easily comprehend and work with diverse types of information such as text, code, audio, images, and videos.
Gemini represents Google’s most versatile model to date, capable of running effectively on a variety of platforms, from data centers to mobile devices. Its cutting-edge capabilities will revolutionize how developers and businesses implement and expand AI.
Google has tailored the initial version, Gemini 1.0, for three distinct sizes:
Exceptional Performance
Google DeepMind has thoroughly tested the Gemini models, assessing their performance across a diverse range of tasks. Whether it’s understanding natural images, audio, video, or engaging in mathematical reasoning, Gemini Ultra has demonstrated superior performance, surpassing current state-of-the-art results on 30 out of 32 widely recognized academic benchmarks commonly used in the research and development of large language models (LLM).
Scoring at 90.0%, Gemini Ultra achieves a groundbreaking feat by surpassing human experts in Massive Multitask Language Understanding (MMLU). This assessment encompasses a diverse set of 57 subjects, including math, physics, history, law, medicine, and ethics, evaluating both global knowledge and problem-solving skills.
Google’s smart way of testing called MMLU helps Gemini think more carefully before answering tricky questions. This makes Gemini much better at handling difficult tasks compared to just relying on quick first impressions.
In Google’s image tests, Gemini Ultra did better than the best models before it, and it did this without using optical character recognition (OCR) systems. OCR systems are tools that pull text from images to work with it more. These tests show that Gemini can handle different types of information on its own, and they suggest that Gemini is starting to show its ability to think in more complex ways.
Google created Gemini to naturally understand various types of information right from the beginning. It was initially trained on different types of data, and then we improved its performance by fine-tuning it with additional multimodal data. This process allows Gemini to effortlessly comprehend and think about various inputs from scratch, surpassing the capabilities of existing multimodal models. In nearly every field, Gemini’s abilities are at the forefront of current technology.
Gemini 1.0 possesses advanced multimodal reasoning skills, enabling it to comprehend intricate written and visual information. This distinctive proficiency makes it exceptionally adept at uncovering valuable knowledge that might be challenging to discern within extensive data sets.
The remarkable capacity of Gemini to extract insights from hundreds of thousands of documents by reading, filtering, and understanding information promises to bring about groundbreaking advancements at digital speeds across various fields, ranging from science to finance.
Gemini 1.0 underwent training to simultaneously recognize and comprehend text, images, audio, and other forms of information. This comprehensive training allows it to grasp nuanced details and respond to questions on intricate subjects more effectively. Gemini excels, particularly in providing clear explanations of reasoning in complex fields such as math and physics.
Google’s initial version of Gemini has the capability to comprehend, elucidate, and produce high-quality code in the world’s most widely used programming languages, including Python, Java, C++, and Go. Its proficiency in working across languages and reasoning through intricate information positions it as one of the foremost foundational models for coding globally.
Gemini Ultra demonstrates excellence in various coding benchmarks, including HumanEval—an industry-standard for assessing coding task performance—and Natural2Code, our internally held-out dataset that utilizes sources generated by authors instead of web-based information.
Furthermore, Gemini can serve as the engine for more advanced coding systems. Two years ago, we introduced AlphaCode, the first AI code generation system to achieve a competitive level of performance in programming competitions.
Utilizing a specialized version of Gemini, we developed an advanced code generation system, AlphaCode 2, which excels in solving competitive programming problems extending beyond coding to encompass complex math and theoretical computer science.
Google is committed to advancing responsible AI practices with a focus on bold initiatives. Building upon Google’s AI Principles and robust safety policies implemented across products, new measures are being added to account for the multimodal capabilities of Gemini. Throughout the developmental stages, potential risks are carefully considered, and efforts are made to test and address them.
Gemini undergoes extensive safety evaluations, surpassing those of any previous Google AI model, covering aspects such as bias and toxicity. Novel research is conducted to identify and address potential risks in areas like cyber-offense, persuasion, and autonomy. Google Research’s advanced adversarial testing techniques are applied to proactively identify critical safety issues before Gemini’s deployment.
Collaboration with a diverse group of external experts and partners helps stress-test the models across various issues and identify potential blind spots.
During Gemini’s training phases, benchmarks such as Real Toxicity Prompts are utilized. This set, consisting of 100,000 prompts with varying toxicity levels from the web, developed by experts at the Allen Institute for AI, aids in diagnosing content safety issues. More details on this work will be disclosed soon.
To minimize harm, dedicated safety classifiers have been built to recognize, label, and filter out content involving violence or negative stereotypes. Ongoing efforts address known challenges for models, including aspects such as factuality, grounding, attribution, and corroboration.
Responsibility and safety remain central to the development and deployment of Google’s models. This commitment involves collaborative efforts, with partnerships established with the industry and broader ecosystem to define best practices and set safety and security benchmarks through organizations like MLCommons, the Frontier Model Forum, its AI Safety Fund, and the Secure AI Framework (SAIF). SAIF is specifically designed to mitigate security risks associated with AI systems in both public and private sectors. Ongoing collaboration with researchers, governments, and civil society groups worldwide is prioritized as Gemini continues to be developed.
Gemini is being introduced to billions of people through various Google products. Bard will utilize a refined version of Gemini Pro, enhancing its capabilities in reasoning, planning, understanding, and more. This marks the most significant upgrade to Bard since its launch. Initially available in English across more than 170 countries and territories, Google has plans to broaden its reach to different modalities, support new languages, and include additional locations in the near future.
Gemini is also making its way to Pixel devices. The Pixel 8 Pro is the first smartphone designed to run Gemini Nano, powering new features like Summarize in the Recorder app and rolling out in Smart Reply on Gboard, beginning with WhatsApp—with more messaging apps to follow next year.
In the upcoming months, Gemini will become available in more of our products and services, including Search, Ads, Chrome, and Duet AI.
Google has already begun experimenting with Gemini in Search, where it is enhancing Search Generative Experience, providing users with a 40% reduction in latency in English in the U.S., along with improvements in quality.
Gemini marks a noteworthy achievement in the progress of AI. It signals the commencement of a new era for Google as they persist in swiftly innovating and responsibly pushing the boundaries of their models.