Acq. 2023·Product

Gemini

Google's first natively multimodal frontier model, built from the ground up to reason across text, images, audio, video, and code simultaneously.

Overview

Gemini was announced by Google DeepMind on December 6, 2023, representing the company's most ambitious effort to unify its AI research capabilities under a single model family. Unlike its predecessor PaLM 2, which handled modalities largely through separate pathways, Gemini was designed from the outset to be natively multimodal — trained jointly on text, images, audio, video, and code rather than retrofitted with vision or audio adapters after the fact. This architectural philosophy reflected lessons learned from years of research at both Google Brain and DeepMind, the two organizations that had formally merged earlier in 2023.

Gemini launched in three sizes — Ultra, Pro, and Nano — each targeting a distinct deployment scenario. Gemini Ultra was the largest and most capable variant, designed for highly complex reasoning tasks on data center hardware. Gemini Pro was aimed at a broad range of tasks and was made available via the Gemini API, powering Google's Bard chatbot (later rebranded to Gemini). Gemini Nano was optimized for on-device inference, running directly on Pixel 8 Pro smartphones without a network round-trip. The tiered release strategy was a deliberate acknowledgment that frontier AI capability must coexist with practical efficiency constraints.

On several benchmarks reported at launch, Gemini Ultra outperformed GPT-4 across a majority of the 32 evaluated tasks, including achieving a reported 90.0% on the MMLU (Massive Multitask Language Understanding) benchmark — the first model to surpass human expert performance on that test. The model also demonstrated strong performance on multimodal reasoning benchmarks such as MMMU (Massive Multitask Multimodal Understanding). Google's release of Gemini marked an inflection point in the competitive landscape, signaling that the gap between OpenAI's models and Google's own frontier systems had substantially narrowed, and in some dimensions closed entirely.

Key Facts

Gemini Ultra achieved 90.0% on the MMLU benchmark, the first model reported to exceed human expert-level performance (89.8%) on that evaluation.
The model family was released in three sizes: Gemini Ultra, Gemini Pro, and Gemini Nano, with Nano running on-device on the Pixel 8 Pro smartphone.
Gemini was announced on December 6, 2023, and Gemini Pro became available to developers via the Gemini API on the same day.
Gemini Ultra outperformed GPT-4 on 30 out of 32 academic benchmarks reported in the technical report.
Gemini Nano came in two sub-variants — Nano-1 (1.8B parameters) and Nano-2 (3.25B parameters) — optimized for different on-device memory and latency constraints.

Why It Matters

Gemini's native multimodality represented a meaningful architectural departure from the dominant paradigm of building large language models first and adding vision or audio capabilities afterward. By training across modalities jointly from the start, Gemini could perform tasks that required interleaving different signal types in ways that adapter-based systems found difficult — for example, reasoning about the audio track and visual frames of a video simultaneously. This approach influenced subsequent model design across the industry, accelerating the shift toward truly unified foundation models rather than ensembles of specialized components.

The release also intensified the public and commercial competition between Google and OpenAI, drawing broad attention to the question of which organizations could sustain frontier AI development over the long term. Gemini's deployment in consumer products like Bard, in enterprise APIs, and on-device via Nano demonstrated that a single model family could span the full stack from cloud to edge. This breadth set a new benchmark for what a vertically integrated AI product strategy could look like, influencing how competitors, regulators, and enterprise customers evaluated the AI landscape entering 2024.

The People

Demis HassabisKoray KavukcuogluJeff DeanOriol VinyalsQuoc V. LeDouglas Eck

Sources

[1]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Google · 2023

https://arxiv.org/abs/2312.11805

[2]

Introducing Gemini: our largest and most capable AI model

Sundar Pichai, Demis Hassabis · 2023

https://blog.google/technology/ai/google-gemini-ai/

[3]

Gemini: Google's newest and most capable AI model — Google DeepMind

Google DeepMind · 2023

https://deepmind.google/technologies/gemini/