LLaMA
Meta's decision to release model weights cracked open the closed ecosystem of large language models and handed capable AI to anyone with a GPU.
By early 2023, the dominant large language models — GPT-4, PaLM, Claude — were accessible only through proprietary APIs controlled by a handful of well-funded companies. Researchers outside those organizations could study model outputs but not internals, fine-tune nothing, and deploy only what the API permitted. Meta AI's release of LLaMA (Large Language Model Meta AI) in February 2023 directly challenged this arrangement by publishing model weights for a family of models ranging from 7 billion to 65 billion parameters, accompanied by a research paper describing training methodology in detail.
LLaMA was trained on approximately 1.4 trillion tokens drawn from publicly available datasets including Common Crawl, C4, GitHub, Wikipedia, Books, ArXiv, and Stack Exchange — deliberately avoiding proprietary data. The key technical finding was efficiency: the 13B parameter LLaMA model outperformed GPT-3 (175B parameters) on most benchmarks, demonstrating that training longer on more tokens with careful data curation could compensate for raw model scale. The team applied techniques including pre-normalization with RMSNorm, SwiGLU activation functions, and rotary positional embeddings, producing a architecture that was both performant and relatively tractable to run.
Meta initially released weights under a non-commercial research license, restricting production deployment. Within days, however, the weights leaked onto 4chan and spread across the internet, a development that effectively rendered the license academic. This leak accelerated an already rapid community response: within weeks, Georgi Gerganov had released llama.cpp, enabling LLaMA inference on consumer hardware including laptops with Apple Silicon. Stanford researchers released Alpaca, a fine-tuned 7B variant trained on 52,000 instruction-following examples generated by GPT-3.5, demonstrating that capable instruction-tuned models could be produced for roughly $600 in compute.
Key Facts
- LLaMA was released in four sizes: 7B, 13B, 33B, and 65B parameters, all trained on 1.4 trillion tokens.
- The 13B LLaMA model outperformed the 175B parameter GPT-3 on most standard NLP benchmarks including BoolQ, PIQA, HellaSwag, and WinoGrande.
- Stanford's Alpaca fine-tune — a direct LLaMA derivative — was produced for approximately $600 in OpenAI API and compute costs, released in March 2023.
- llama.cpp, released by Georgi Gerganov in March 2023, enabled 4-bit quantized LLaMA inference on a MacBook with no GPU, achieving several tokens per second.
- LLaMA 2, released July 18, 2023 in partnership with Microsoft, extended the model family to 70B parameters and introduced a commercially permissive license.
LLaMA established open-weight language models as a credible alternative to closed API services, not merely an academic curiosity. The combination of competitive benchmark performance, publicly available weights, and a tractable architecture that the community could immediately modify created a Cambrian explosion of derivatives: Vicuna, WizardLM, OpenLlama, and dozens of others followed within months, each iterating on fine-tuning, quantization, and alignment techniques. This ecosystem proved that the cost of entry for serious LLM research and deployment could be measured in hundreds rather than millions of dollars.
The longer-term structural consequence was a recalibration of the entire industry's assumptions about openness. Meta followed LLaMA with LLaMA 2 in July 2023, this time under a license permitting commercial use for organizations with fewer than 700 million monthly active users — a deliberate move to legitimize open-weight models in production settings. Competitors including Mistral, Falcon, and eventually Meta's own LLaMA 3 continued raising the capability ceiling of openly available models, permanently altering the negotiating position of developers who had previously been dependent on a small number of API gatekeepers.
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample · 2023
https://arxiv.org/abs/2302.13971
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, et al. · 2023
https://arxiv.org/abs/2307.09288
Alpaca: A Strong, Replicable Instruction-Following Model
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto · 2023
https://crfm.stanford.edu/2023/03/13/alpaca.html