Acq. 2020·Research

GPT-3

At 175 billion parameters, GPT-3 demonstrated that scaling a language model could unlock emergent capabilities no one had explicitly trained for.

Overview

When OpenAI released the GPT-3 paper in May 2020, it arrived in a field already transformed by its predecessors. GPT-2 (2019) had shown that large autoregressive language models could produce surprisingly coherent text, but it remained clearly machine-generated to careful readers. GPT-3 changed the subjective experience of interacting with a language model. Trained on roughly 45 terabytes of filtered text from the internet, books, and Wikipedia, it dwarfed every public model that came before it. Its 175 billion parameters were more than 100 times those of GPT-2.

Technically, GPT-3 used the same transformer decoder architecture as GPT-2, with no fundamental innovations in the underlying design. What was new was scale, and scale turned out to matter enormously. The model was evaluated extensively under few-shot, one-shot, and zero-shot conditions — meaning it was given only a handful of examples in the prompt, rather than fine-tuned on task-specific data. Under these conditions it matched or approached fine-tuned models on many NLP benchmarks, including translation, question answering, and reading comprehension. It also generated prose, poetry, code, and structured data at a quality that stunned early access users.

OpenAI did not release GPT-3's weights publicly. Instead, in June 2020 it announced a commercial API, making the model accessible to developers and researchers through a rate-limited interface. This decision marked a significant shift from OpenAI's earlier open-release posture and presaged the API-first business model that would define the company. Within months, thousands of applications had been built on the API, ranging from code completion tools to creative writing assistants, and a secondary discourse emerged about the societal implications of near-human text generation at scale.

Key Facts

175 billion trainable parameters — more than 100× the size of GPT-2's largest variant (1.5B).
Trained on approximately 300 billion tokens drawn from a filtered version of Common Crawl, WebText2, Books1, Books2, and English Wikipedia.
The paper reported 88.0% accuracy on the TriviaQA benchmark in the zero-shot setting, surpassing fine-tuned models of the era.
The model's API was announced in June 2020; by the end of 2020 it had received over 300 million API calls per day according to OpenAI statements.
Training compute was estimated at approximately 3.14 × 10²³ FLOP, making it by far the most expensive language model trained at the time of publication.

Why It Matters

GPT-3 established the empirical case for the scaling hypothesis — the idea that increasing model size and training data, without architectural reinvention, could yield qualitative jumps in capability. This reoriented research priorities across the entire field. Labs that had focused on clever architectural improvements began devoting resources to simply training larger models. The paper's few-shot learning results suggested that a sufficiently large model could perform new tasks from natural language instructions alone, foreshadowing the instruction-tuning and prompt-engineering paradigms that followed.

Beyond research, GPT-3 triggered the first mainstream public reckoning with capable AI language systems. Viral Twitter threads and essays in 2020 debated whether the model 'understood' language, whether it posed economic risks to writers and programmers, and how it should be governed. It directly inspired subsequent systems — including Codex (2021), InstructGPT (2022), and ChatGPT (2022) — that built on its weights or architecture. In this sense, GPT-3 was not only a research artifact but the founding product of a new industry.

The People

Tom B. BrownBenjamin MannNick RyderMelanie SubbiahJared KaplanPrafulla DhariwalArvind NeelakantanPranav ShyamGirish SastryAmanda AskellSandhini AgarwalAriel Herbert-VossGretchen KruegerTom HenighanRewon ChildAditya RameshDaniel M. ZieglerJeffrey WuClemens WinterChristopher HesseMark ChenEric SiglerMateusz LitwinScott GrayBenjamin ChessJack ClarkChristopher BernerSam McCandlishAlec RadfordIlya SutskeverDario Amodei

Sources

[1]

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, et al. · 2020

https://arxiv.org/abs/2005.14165

[2]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei · 2020

https://arxiv.org/abs/2001.08361

[3]

OpenAI API

OpenAI · 2020

https://openai.com/blog/openai-api