Is a foundation model the same as an LLM?

An LLM is the most common type of foundation model, specialized to language. Foundation models also include multimodal and other general-purpose models.

Why are they called 'foundation' models?

Because they serve as a shared base that many applications are built on, rather than a model trained for a single task.

Do I need to train one?

Almost never. Pretraining is extremely costly; nearly all value comes from adapting an existing base via prompting, retrieval or fine-tuning.

How do agents relate to foundation models?

An agent uses a foundation model as its reasoning core, wrapped in tools, memory and a control loop — the harness — to take actions.

ConceptsUpdated 2026-06-21 · Version 1.0

What are Foundation Models?

A foundation model is a large model pretrained on broad data at scale that can be adapted to a wide range of downstream tasks. Large language models (LLMs) and large multimodal models are the canonical examples. The term, coined at Stanford in 2021, captures a shift: instead of training a bespoke model per task, organizations build on a shared, general-purpose base — then specialize it through prompting, retrieval or fine-tuning.

Evidence: BenchmarkConfidence: HighSource: BenchmarkSource: Paper

Machine-readable: JSON

Definition

A foundation model is a large, general-purpose model pretrained on broad data that serves as a base which can be adapted — via prompting, retrieval or fine-tuning — to many downstream tasks.

Key takeaways

Foundation models are general bases adapted to many tasks.
LLMs and multimodal models are the leading examples.
Most are built on the transformer architecture.
Capabilities emerge with scale of data, parameters and compute.
You adapt them by prompting, retrieval (RAG) or fine-tuning — rarely by training from scratch.

Context

Before foundation models, teams trained narrow models for each task. The foundation-model paradigm flips this: one large model is pretrained once on broad data, then reused everywhere. That reuse is why a handful of models now underpin most AI products.

It also concentrates capability — and risk. Because so much is built on a few bases, their biases, failures and security properties propagate downstream, which is part of why governance and evaluation matter.

Architecture

Pretraining: a model with millions to trillions of parameters learns general patterns from massive datasets, typically with self-supervised objectives like next-token prediction. The transformer's attention mechanism makes this scalable.

Adaptation: the same base is specialized for use — zero/few-shot prompting, retrieval-augmented generation for fresh or private knowledge, or fine-tuning for behavior and domain. Agents wrap the model in tools and a harness.

Components

Transformer architecturePretraining data & objectiveParameters (weights)TokenizerAdaptation layer (prompt / RAG / fine-tune)

Benefits

One base reused across many tasks.
Strong general capability out of the box.
Rapid adaptation without training from scratch.
Multimodal variants span text, image, audio and more.

Risks

Concentrated risk: flaws propagate to everything built on them.
Costly to pretrain; few organizations can.
Inherit biases and gaps from training data.
Knowledge is frozen at training time without retrieval.

Tools & technologies

Frontier LLMs (Claude, GPT, Gemini)Open-weight models (Llama, Mistral)Multimodal modelsModel hosting / inference platforms

Examples

Using one LLM for summarization, classification and drafting across an org.
Adapting a base model to a domain with retrieval instead of retraining.
Building an agent on a frontier model plus tools and memory.

FAQs

Is a foundation model the same as an LLM?: An LLM is the most common type of foundation model, specialized to language. Foundation models also include multimodal and other general-purpose models.
Why are they called 'foundation' models?: Because they serve as a shared base that many applications are built on, rather than a model trained for a single task.
Do I need to train one?: Almost never. Pretraining is extremely costly; nearly all value comes from adapting an existing base via prompting, retrieval or fine-tuning.
How do agents relate to foundation models?: An agent uses a foundation model as its reasoning core, wrapped in tools, memory and a control loop — the harness — to take actions.