Foundation Models: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
New BloomWiki article: Foundation Models
 
BloomWiki: Foundation Models
Line 20: Line 20:


== Understanding ==
== Understanding ==
The foundation model paradigm shifts AI from task-specific training to **two-stage learning**: (1) Pre-train a very large model on massive diverse data; (2) adapt that model efficiently to specific tasks.
The foundation model paradigm shifts AI from task-specific training to '''two-stage learning''': (1) Pre-train a very large model on massive diverse data; (2) adapt that model efficiently to specific tasks.


This is economically powerful: pre-training is extremely expensive (millions in compute), but done once by a well-resourced organization. Adaptation is cheap (hours to days), done by anyone with access to the pre-trained model. The result: a few pre-trained models underpin thousands of applications.
This is economically powerful: pre-training is extremely expensive (millions in compute), but done once by a well-resourced organization. Adaptation is cheap (hours to days), done by anyone with access to the pre-trained model. The result: a few pre-trained models underpin thousands of applications.


**Why scale matters for foundation models**: Empirically, model capabilities improve predictably with scale (parameters, data, compute) — this is the scaling law. But some capabilities (multi-step reasoning, in-context learning, code generation) emerge only above certain scale thresholds, not visible in smaller versions. This makes foundation models qualitatively different from their smaller predecessors.
'''Why scale matters for foundation models''': Empirically, model capabilities improve predictably with scale (parameters, data, compute) — this is the scaling law. But some capabilities (multi-step reasoning, in-context learning, code generation) emerge only above certain scale thresholds, not visible in smaller versions. This makes foundation models qualitatively different from their smaller predecessors.


**The ecosystem**: Foundation model providers (OpenAI, Anthropic, Google, Meta, Mistral) pre-train models. Application developers build on top via APIs or open weights. Users interact with applications. This creates a layered value chain where foundation model capabilities and limitations propagate through the entire stack.
'''The ecosystem''': Foundation model providers (OpenAI, Anthropic, Google, Meta, Mistral) pre-train models. Application developers build on top via APIs or open weights. Users interact with applications. This creates a layered value chain where foundation model capabilities and limitations propagate through the entire stack.


**Risks of foundation models**: Homogenization — a shared flaw in a widely-used foundation model (bias, factual error, security vulnerability) propagates to all downstream applications simultaneously. Concentration of power — a small number of organizations control access to the most capable models. Data contamination — foundation models trained on internet data may have memorized test benchmarks, inflating apparent performance.
'''Risks of foundation models''': Homogenization — a shared flaw in a widely-used foundation model (bias, factual error, security vulnerability) propagates to all downstream applications simultaneously. Concentration of power — a small number of organizations control access to the most capable models. Data contamination — foundation models trained on internet data may have memorized test benchmarks, inflating apparent performance.


== Applying ==
== Applying ==
Line 89: Line 89:


== Evaluating ==
== Evaluating ==
Foundation model evaluation requires a multi-dimensional benchmark suite: **MMLU** (knowledge breadth), **HumanEval** (coding), **GSM8K/MATH** (mathematics), **MT-Bench** (instruction following), **HELM** (holistic), **LMSYS Chatbot Arena** (human preference). No single benchmark captures all capabilities. Expert practitioners run their own domain-specific evaluations on task-representative data rather than relying solely on published benchmarks.
Foundation model evaluation requires a multi-dimensional benchmark suite: '''MMLU''' (knowledge breadth), '''HumanEval''' (coding), '''GSM8K/MATH''' (mathematics), '''MT-Bench''' (instruction following), '''HELM''' (holistic), '''LMSYS Chatbot Arena''' (human preference). No single benchmark captures all capabilities. Expert practitioners run their own domain-specific evaluations on task-representative data rather than relying solely on published benchmarks.


== Creating ==
== Creating ==

Revision as of 14:20, 23 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Foundation models are large-scale AI models pre-trained on broad, diverse data that can be adapted to a wide range of downstream tasks. Unlike task-specific models trained for a single purpose, foundation models serve as a general-purpose base — a shared starting point from which many specialized applications are built through fine-tuning, prompting, or retrieval augmentation. GPT-4, Gemini, Claude, DALL-E 3, Stable Diffusion, CLIP, and AlphaFold are all foundation models. The concept unifies much of modern AI under a common paradigm: pre-train at scale, then adapt.

Remembering

  • Foundation model — A large model trained on broad data at scale, adapted to many downstream tasks. Term coined by Stanford HAI in 2021.
  • Pre-training — The initial training phase on massive, diverse datasets that gives the model its general capabilities.
  • Adaptation — Tailoring a foundation model for a specific task via fine-tuning, prompting, RLHF, or retrieval augmentation.
  • Emergence — Capabilities that appear only at scale; not present in smaller versions of the same model.
  • Transfer learning (foundation model sense) — Using the knowledge encoded in a foundation model's weights as a starting point for new tasks.
  • Multimodal foundation model — A foundation model trained on multiple modalities (text, image, audio, video).
  • GPT-4 — OpenAI's large multimodal language foundation model.
  • Gemini — Google DeepMind's natively multimodal foundation model family.
  • CLIP — OpenAI's vision-language foundation model; enables zero-shot image classification.
  • DALL-E 3 — OpenAI's text-to-image foundation model.
  • Llama 3 — Meta's open-weight language foundation model family.
  • Mistral — Efficient open-weight language foundation models.
  • ESM-2 — A protein language foundation model trained on evolutionary sequence data.
  • SAM (Segment Anything Model) — Meta's vision foundation model for image segmentation.
  • Homogenization risk — The risk that widespread use of a small number of foundation models spreads shared flaws, biases, and failure modes.

Understanding

The foundation model paradigm shifts AI from task-specific training to two-stage learning: (1) Pre-train a very large model on massive diverse data; (2) adapt that model efficiently to specific tasks.

This is economically powerful: pre-training is extremely expensive (millions in compute), but done once by a well-resourced organization. Adaptation is cheap (hours to days), done by anyone with access to the pre-trained model. The result: a few pre-trained models underpin thousands of applications.

Why scale matters for foundation models: Empirically, model capabilities improve predictably with scale (parameters, data, compute) — this is the scaling law. But some capabilities (multi-step reasoning, in-context learning, code generation) emerge only above certain scale thresholds, not visible in smaller versions. This makes foundation models qualitatively different from their smaller predecessors.

The ecosystem: Foundation model providers (OpenAI, Anthropic, Google, Meta, Mistral) pre-train models. Application developers build on top via APIs or open weights. Users interact with applications. This creates a layered value chain where foundation model capabilities and limitations propagate through the entire stack.

Risks of foundation models: Homogenization — a shared flaw in a widely-used foundation model (bias, factual error, security vulnerability) propagates to all downstream applications simultaneously. Concentration of power — a small number of organizations control access to the most capable models. Data contamination — foundation models trained on internet data may have memorized test benchmarks, inflating apparent performance.

Applying

Building a domain-adapted application on a foundation model: <syntaxhighlight lang="python"> from openai import OpenAI from anthropic import Anthropic import os

  1. Using a foundation model as-is (zero-shot)

client = OpenAI() response = client.chat.completions.create(

   model="gpt-4o",
   messages=[{"role":"user","content":"Summarize the key risks of foundation models."}]

)

  1. Fine-tuning a foundation model for a specific domain
  2. (e.g., customer support for a specific product)

training_file = client.files.create(

   file=open("support_conversations.jsonl", "rb"),
   purpose="fine-tuning"

) fine_tune_job = client.fine_tuning.jobs.create(

   training_file=training_file.id,
   model="gpt-4o-mini",  # More affordable foundation model to fine-tune
   hyperparameters={"n_epochs": 3}

) print(f"Fine-tune job: {fine_tune_job.id}")

  1. Using open-weight foundation model locally

from transformers import pipeline generator = pipeline("text-generation", model="meta-llama/Llama-3.2-3B-Instruct",

                    device_map="auto")

result = generator("Explain quantum entanglement:", max_new_tokens=200) </syntaxhighlight>

Foundation model landscape by modality
Language (proprietary) → GPT-4o (OpenAI), Claude 3.5 (Anthropic), Gemini 1.5 Pro (Google)
Language (open-weight) → Llama 3 (Meta), Mistral, Qwen2.5, DeepSeek-V3
Vision-language → GPT-4o, Gemini, LLaVA, InternVL, Qwen2-VL
Image generation → DALL-E 3, Stable Diffusion 3, Flux.1, Midjourney
Code → Claude 3.5 Sonnet, GPT-4o, DeepSeek-Coder, StarCoder2
Biology → ESM-2 (proteins), AlphaFold3, ChemBERTa (molecules), SAM (segmentation)

Analyzing

Foundation Model Access Models
Access Type Examples Control Cost Privacy
Proprietary API GPT-4o, Claude, Gemini Low Pay-per-token Data sent to provider
Open-weight (download) Llama 3, Mistral, Qwen Full Hardware cost On-premises
Open-weight (API) Together.ai, Replicate Medium Pay-per-token Depends on provider
Self-hosted proprietary Azure OpenAI (some) Medium License + infra Configurable

Failure modes: Capability overhang — users deploy foundation models for tasks they're not designed for, causing subtle failures. Benchmark contamination — models may have seen test data during pre-training. Homogenization — shared failure modes propagate across all applications. Misalignment — foundation model values don't match application's specific user population needs.

Evaluating

Foundation model evaluation requires a multi-dimensional benchmark suite: MMLU (knowledge breadth), HumanEval (coding), GSM8K/MATH (mathematics), MT-Bench (instruction following), HELM (holistic), LMSYS Chatbot Arena (human preference). No single benchmark captures all capabilities. Expert practitioners run their own domain-specific evaluations on task-representative data rather than relying solely on published benchmarks.

Creating

Selecting and adapting a foundation model: (1) Define modality and task requirements. (2) Choose: proprietary API (fastest, most capable, privacy concerns) vs. open-weight (full control, on-premises, lower capability ceiling). (3) Evaluate 2–3 candidate models on 100+ representative inputs from your domain. (4) Adaptation strategy: prompting first (cheapest), then RAG (for knowledge), then fine-tuning (for style/format), then pre-training (only if truly novel domain). (5) Monitor model provider updates — foundation models are updated without notice, potentially breaking downstream applications.