A foundation model is a large-scale artificial intelligence model trained on vast amounts of data that can be adapted to a wide variety of downstream tasks. The term was coined by Stanford's Institute for Human-Centered AI (HAI) in 2021 to describe models like GPT, BERT, and DALL-E that serve as the "foundation" for many AI applications.
Key Characteristics
Foundation models share several defining properties:
- Scale — Trained on massive datasets (trillions of tokens of text, billions of images) using enormous compute resources
- Self-supervised learning — Typically trained without explicit human labels, learning patterns from raw data
- Transfer learning — Can be fine-tuned or prompted for tasks they were not explicitly trained for
- Emergent abilities — Display capabilities (like reasoning, coding, or translation) that arise from scale rather than explicit programming
Major Foundation Models (2026)
| Model | Company | Type | Key Capability |
|---|---|---|---|
| GPT-5 | OpenAI | Language + Vision | General reasoning, coding |
| Claude 4 | Anthropic | Language + Vision | Safety, analysis, coding |
| Gemini 2 | Google DeepMind | Multimodal | Search, reasoning |
| Grok 3 | xAI | Language | Real-time information |
| Mistral Large | Mistral AI | Language | Open-weight, European |
| Llama 4 | Meta | Language | Open-source |
How Foundation Models Are Built
Building a foundation model requires three key ingredients:
1. Data
- Web-scale text corpora (Common Crawl, books, code)
- Licensed datasets (news, academic papers)
- Synthetic data generated by other models
- Cost: $10M-$100M+ for high-quality data curation
2. Compute
- Thousands to tens of thousands of GPUs (NVIDIA H100, B200)
- Training runs lasting weeks to months
- Cost: $100M-$1B+ per training run for frontier models
3. Algorithms
- Transformer architecture (attention mechanism)
- Reinforcement learning from human feedback (RLHF)
- Constitutional AI (Anthropic's approach to alignment)
- Mixture of experts (MoE) for efficient scaling
The Foundation Model Business
Foundation models have created a new category of technology company. The economics are unusual:
- High fixed costs: Training a frontier model costs $100M-$1B+
- Low marginal costs: Serving inference is relatively cheap per query
- API business model: Most revenue comes from API access (per-token pricing)
- Enterprise licensing: Companies pay for private deployments and fine-tuning
Open vs. Closed Models
A key debate in the foundation model space is open vs. closed:
- Closed models (GPT, Claude): Available only through APIs, with proprietary weights
- Open-weight models (Llama, Mistral): Model weights are publicly released, allowing anyone to run and modify them
- Open-source models: Fully open, including training code and data
Each approach has tradeoffs around safety, accessibility, and business viability.
Funding in Foundation Models
The Foundation Models & AGI sector has attracted the largest funding rounds in venture history. Companies like OpenAI ($340B valuation), Anthropic ($60B), xAI ($80B), and Mistral AI have collectively raised tens of billions of dollars. This concentration of capital reflects the belief that foundation models are a platform technology — whoever builds the best model captures an outsized share of the AI market.
Why Foundation Models Matter for Investors
For venture investors, foundation models represent both an opportunity and a challenge:
- Direct investment: Backing foundation model companies requires massive capital but offers platform-level returns
- Application layer: Most startups build on top of foundation models rather than training their own
- Infrastructure play: Companies providing compute, data, and tooling for model training benefit regardless of which model wins