GPT-4
GPT-4 crossed a threshold no prior AI system had reached: reliably human-competitive performance on the hardest standardized tests humans use to credential one another.
GPT-4 was released by OpenAI on March 14, 2023, as the fourth generation of their Generative Pre-trained Transformer series. It arrived roughly sixteen months after GPT-3.5 powered the launch of ChatGPT, a period in which public expectations for large language models had been radically reset. Unlike its predecessors, GPT-4 was announced as a multimodal model capable of accepting both text and image inputs, though image input remained accessible only to a limited set of API partners at launch, with broader rollout following later in the year.
The technical report OpenAI published alongside the release was notable for what it withheld as much as what it disclosed. OpenAI declined to state the model's parameter count, training compute, or training data composition, citing competitive and safety concerns — a significant departure from the transparency that had characterized earlier GPT releases. What the report did provide was an extensive suite of benchmark evaluations, including performance on the Uniform Bar Exam, the SAT, the GRE, multiple AP subject exams, and a range of academic and coding benchmarks. GPT-4 scored in approximately the 90th percentile on the Uniform Bar Exam, compared to GPT-3.5's score near the 10th percentile — a gap that illustrated how discontinuous the capability jump had been. The model also demonstrated substantially improved instruction-following, reduced hallucination rates, and a context window of 8,192 tokens in the base version, with a 32,768-token variant available to select developers.
GPT-4's deployment was tightly integrated into OpenAI's commercial ecosystem from day one. It became the engine behind ChatGPT Plus, the subscription tier OpenAI had introduced in February 2023, and was made available via the OpenAI API for developers building applications. Microsoft, which had announced a multi-billion-dollar investment in OpenAI in January 2023, integrated GPT-4 into Bing Chat and a broad suite of Microsoft 365 products under the Copilot branding. Within weeks of launch, third-party developers reported GPT-4 outperforming GPT-3.5 on complex reasoning, multi-step coding tasks, and nuanced instruction adherence — differences that rapidly reshaped the competitive landscape for AI-powered software products.
Key Facts
- GPT-4 scored approximately the 90th percentile on the Uniform Bar Exam, compared to approximately the 10th percentile for GPT-3.5.
- The model was released on March 14, 2023, with a base context window of 8,192 tokens and an extended variant supporting 32,768 tokens.
- GPT-4 accepted image inputs in addition to text, making it OpenAI's first publicly deployed multimodal large language model.
- OpenAI's technical report evaluated GPT-4 on more than 50 standardized academic and professional exams, including the SAT, GRE, AP exams, and USMLE medical licensing exams.
- Microsoft integrated GPT-4 into Bing Chat and Microsoft 365 Copilot products following a reported multi-billion-dollar investment commitment announced in January 2023.
GPT-4 mattered because it moved AI performance from 'impressive but unreliable' to 'reliably competent across a broad range of expert-level tasks.' The bar exam result was symbolic but also precise: the 90th-percentile score meant GPT-4 would have passed the licensing threshold required to practice law in most U.S. jurisdictions. Similar scores appeared across medical licensing exams, coding competitions, and graduate admissions tests. For the first time, the question of whether AI could substitute for credentialed human professionals in narrow, well-defined tasks became not theoretical but empirical.
The release also accelerated a structural shift in how AI capabilities were delivered and competed over. By withholding architectural details while publishing only benchmark results and a safety evaluation framework, OpenAI established a new industry norm — subsequently followed by Google's Gemini announcements and Anthropic's Claude releases — in which frontier AI products are evaluated as black-box systems rather than open scientific artifacts. This shift had lasting consequences for AI safety research, regulatory policy, and open-source competition, as it decoupled public accountability from technical transparency in ways the field continues to negotiate.
GPT-4 Technical Report
OpenAI · 2023
https://arxiv.org/abs/2303.08774
GPT-4 is OpenAI's most advanced system, producing safer and more useful responses
OpenAI · 2023
https://openai.com/research/gpt-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang · 2023
https://arxiv.org/abs/2303.12712