Foundation Models: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
BloomWiki: Foundation Models
BloomWiki: Foundation Models
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Foundation models are large-scale AI models pre-trained on broad, diverse data that can be adapted to a wide range of downstream tasks. Unlike task-specific models trained for a single purpose, foundation models serve as a general-purpose base — a shared starting point from which many specialized applications are built through fine-tuning, prompting, or retrieval augmentation. GPT-4, Gemini, Claude, DALL-E 3, Stable Diffusion, CLIP, and AlphaFold are all foundation models. The concept unifies much of modern AI under a common paradigm: pre-train at scale, then adapt.
Foundation models are large-scale AI models pre-trained on broad, diverse data that can be adapted to a wide range of downstream tasks. Unlike task-specific models trained for a single purpose, foundation models serve as a general-purpose base — a shared starting point from which many specialized applications are built through fine-tuning, prompting, or retrieval augmentation. GPT-4, Gemini, Claude, DALL-E 3, Stable Diffusion, CLIP, and AlphaFold are all foundation models. The concept unifies much of modern AI under a common paradigm: pre-train at scale, then adapt.
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Foundation model''' — A large model trained on broad data at scale, adapted to many downstream tasks. Term coined by Stanford HAI in 2021.
* '''Foundation model''' — A large model trained on broad data at scale, adapted to many downstream tasks. Term coined by Stanford HAI in 2021.
* '''Pre-training''' — The initial training phase on massive, diverse datasets that gives the model its general capabilities.
* '''Pre-training''' — The initial training phase on massive, diverse datasets that gives the model its general capabilities.
Line 18: Line 23:
* '''SAM (Segment Anything Model)''' — Meta's vision foundation model for image segmentation.
* '''SAM (Segment Anything Model)''' — Meta's vision foundation model for image segmentation.
* '''Homogenization risk''' — The risk that widespread use of a small number of foundation models spreads shared flaws, biases, and failure modes.
* '''Homogenization risk''' — The risk that widespread use of a small number of foundation models spreads shared flaws, biases, and failure modes.
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
The foundation model paradigm shifts AI from task-specific training to '''two-stage learning''': (1) Pre-train a very large model on massive diverse data; (2) adapt that model efficiently to specific tasks.
== <span style="color: #FFFFFF;">Understanding</span> ==
The foundation model paradigm shifts AI from task-specific training to '''two-stage learning''':
# Pre-train a very large model on massive diverse data;
# adapt that model efficiently to specific tasks.


This is economically powerful: pre-training is extremely expensive (millions in compute), but done once by a well-resourced organization. Adaptation is cheap (hours to days), done by anyone with access to the pre-trained model. The result: a few pre-trained models underpin thousands of applications.
This is economically powerful: pre-training is extremely expensive (millions in compute), but done once by a well-resourced organization. Adaptation is cheap (hours to days), done by anyone with access to the pre-trained model. The result: a few pre-trained models underpin thousands of applications.
Line 29: Line 38:


'''Risks of foundation models''': Homogenization — a shared flaw in a widely-used foundation model (bias, factual error, security vulnerability) propagates to all downstream applications simultaneously. Concentration of power — a small number of organizations control access to the most capable models. Data contamination — foundation models trained on internet data may have memorized test benchmarks, inflating apparent performance.
'''Risks of foundation models''': Homogenization — a shared flaw in a widely-used foundation model (bias, factual error, security vulnerability) propagates to all downstream applications simultaneously. Concentration of power — a small number of organizations control access to the most capable models. Data contamination — foundation models trained on internet data may have memorized test benchmarks, inflating apparent performance.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Building a domain-adapted application on a foundation model:'''
'''Building a domain-adapted application on a foundation model:'''
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
Line 71: Line 82:
: '''Code''' → Claude 3.5 Sonnet, GPT-4o, DeepSeek-Coder, StarCoder2
: '''Code''' → Claude 3.5 Sonnet, GPT-4o, DeepSeek-Coder, StarCoder2
: '''Biology''' → ESM-2 (proteins), AlphaFold3, ChemBERTa (molecules), SAM (segmentation)
: '''Biology''' → ESM-2 (proteins), AlphaFold3, ChemBERTa (molecules), SAM (segmentation)
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Foundation Model Access Models
|+ Foundation Model Access Models
Line 87: Line 100:


'''Failure modes''': Capability overhang — users deploy foundation models for tasks they're not designed for, causing subtle failures. Benchmark contamination — models may have seen test data during pre-training. Homogenization — shared failure modes propagate across all applications. Misalignment — foundation model values don't match application's specific user population needs.
'''Failure modes''': Capability overhang — users deploy foundation models for tasks they're not designed for, causing subtle failures. Benchmark contamination — models may have seen test data during pre-training. Homogenization — shared failure modes propagate across all applications. Misalignment — foundation model values don't match application's specific user population needs.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Foundation model evaluation requires a multi-dimensional benchmark suite: '''MMLU''' (knowledge breadth), '''HumanEval''' (coding), '''GSM8K/MATH''' (mathematics), '''MT-Bench''' (instruction following), '''HELM''' (holistic), '''LMSYS Chatbot Arena''' (human preference). No single benchmark captures all capabilities. Expert practitioners run their own domain-specific evaluations on task-representative data rather than relying solely on published benchmarks.
Foundation model evaluation requires a multi-dimensional benchmark suite: '''MMLU''' (knowledge breadth), '''HumanEval''' (coding), '''GSM8K/MATH''' (mathematics), '''MT-Bench''' (instruction following), '''HELM''' (holistic), '''LMSYS Chatbot Arena''' (human preference). No single benchmark captures all capabilities. Expert practitioners run their own domain-specific evaluations on task-representative data rather than relying solely on published benchmarks.
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Selecting and adapting a foundation model: (1) Define modality and task requirements. (2) Choose: proprietary API (fastest, most capable, privacy concerns) vs. open-weight (full control, on-premises, lower capability ceiling). (3) Evaluate 2–3 candidate models on 100+ representative inputs from your domain. (4) Adaptation strategy: prompting first (cheapest), then RAG (for knowledge), then fine-tuning (for style/format), then pre-training (only if truly novel domain). (5) Monitor model provider updates — foundation models are updated without notice, potentially breaking downstream applications.
== <span style="color: #FFFFFF;">Creating</span> ==
Selecting and adapting a foundation model:
# Define modality and task requirements.
# Choose: proprietary API (fastest, most capable, privacy concerns) vs. open-weight (full control, on-premises, lower capability ceiling).
# Evaluate 2–3 candidate models on 100+ representative inputs from your domain.
# Adaptation strategy: prompting first (cheapest), then RAG (for knowledge), then fine-tuning (for style/format), then pre-training (only if truly novel domain).
# Monitor model provider updates — foundation models are updated without notice, potentially breaking downstream applications.


[[Category:Artificial Intelligence]]
[[Category:Artificial Intelligence]]
[[Category:Foundation Models]]
[[Category:Foundation Models]]
[[Category:Large Language Models]]
[[Category:Large Language Models]]
</div>

Latest revision as of 01:51, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Foundation models are large-scale AI models pre-trained on broad, diverse data that can be adapted to a wide range of downstream tasks. Unlike task-specific models trained for a single purpose, foundation models serve as a general-purpose base — a shared starting point from which many specialized applications are built through fine-tuning, prompting, or retrieval augmentation. GPT-4, Gemini, Claude, DALL-E 3, Stable Diffusion, CLIP, and AlphaFold are all foundation models. The concept unifies much of modern AI under a common paradigm: pre-train at scale, then adapt.

Remembering[edit]

  • Foundation model — A large model trained on broad data at scale, adapted to many downstream tasks. Term coined by Stanford HAI in 2021.
  • Pre-training — The initial training phase on massive, diverse datasets that gives the model its general capabilities.
  • Adaptation — Tailoring a foundation model for a specific task via fine-tuning, prompting, RLHF, or retrieval augmentation.
  • Emergence — Capabilities that appear only at scale; not present in smaller versions of the same model.
  • Transfer learning (foundation model sense) — Using the knowledge encoded in a foundation model's weights as a starting point for new tasks.
  • Multimodal foundation model — A foundation model trained on multiple modalities (text, image, audio, video).
  • GPT-4 — OpenAI's large multimodal language foundation model.
  • Gemini — Google DeepMind's natively multimodal foundation model family.
  • CLIP — OpenAI's vision-language foundation model; enables zero-shot image classification.
  • DALL-E 3 — OpenAI's text-to-image foundation model.
  • Llama 3 — Meta's open-weight language foundation model family.
  • Mistral — Efficient open-weight language foundation models.
  • ESM-2 — A protein language foundation model trained on evolutionary sequence data.
  • SAM (Segment Anything Model) — Meta's vision foundation model for image segmentation.
  • Homogenization risk — The risk that widespread use of a small number of foundation models spreads shared flaws, biases, and failure modes.

Understanding[edit]

The foundation model paradigm shifts AI from task-specific training to two-stage learning:

  1. Pre-train a very large model on massive diverse data;
  2. adapt that model efficiently to specific tasks.

This is economically powerful: pre-training is extremely expensive (millions in compute), but done once by a well-resourced organization. Adaptation is cheap (hours to days), done by anyone with access to the pre-trained model. The result: a few pre-trained models underpin thousands of applications.

Why scale matters for foundation models: Empirically, model capabilities improve predictably with scale (parameters, data, compute) — this is the scaling law. But some capabilities (multi-step reasoning, in-context learning, code generation) emerge only above certain scale thresholds, not visible in smaller versions. This makes foundation models qualitatively different from their smaller predecessors.

The ecosystem: Foundation model providers (OpenAI, Anthropic, Google, Meta, Mistral) pre-train models. Application developers build on top via APIs or open weights. Users interact with applications. This creates a layered value chain where foundation model capabilities and limitations propagate through the entire stack.

Risks of foundation models: Homogenization — a shared flaw in a widely-used foundation model (bias, factual error, security vulnerability) propagates to all downstream applications simultaneously. Concentration of power — a small number of organizations control access to the most capable models. Data contamination — foundation models trained on internet data may have memorized test benchmarks, inflating apparent performance.

Applying[edit]

Building a domain-adapted application on a foundation model: <syntaxhighlight lang="python"> from openai import OpenAI from anthropic import Anthropic import os

  1. Using a foundation model as-is (zero-shot)

client = OpenAI() response = client.chat.completions.create(

   model="gpt-4o",
   messages=[{"role":"user","content":"Summarize the key risks of foundation models."}]

)

  1. Fine-tuning a foundation model for a specific domain
  2. (e.g., customer support for a specific product)

training_file = client.files.create(

   file=open("support_conversations.jsonl", "rb"),
   purpose="fine-tuning"

) fine_tune_job = client.fine_tuning.jobs.create(

   training_file=training_file.id,
   model="gpt-4o-mini",  # More affordable foundation model to fine-tune
   hyperparameters={"n_epochs": 3}

) print(f"Fine-tune job: {fine_tune_job.id}")

  1. Using open-weight foundation model locally

from transformers import pipeline generator = pipeline("text-generation", model="meta-llama/Llama-3.2-3B-Instruct",

                    device_map="auto")

result = generator("Explain quantum entanglement:", max_new_tokens=200) </syntaxhighlight>

Foundation model landscape by modality
Language (proprietary) → GPT-4o (OpenAI), Claude 3.5 (Anthropic), Gemini 1.5 Pro (Google)
Language (open-weight) → Llama 3 (Meta), Mistral, Qwen2.5, DeepSeek-V3
Vision-language → GPT-4o, Gemini, LLaVA, InternVL, Qwen2-VL
Image generation → DALL-E 3, Stable Diffusion 3, Flux.1, Midjourney
Code → Claude 3.5 Sonnet, GPT-4o, DeepSeek-Coder, StarCoder2
Biology → ESM-2 (proteins), AlphaFold3, ChemBERTa (molecules), SAM (segmentation)

Analyzing[edit]

Foundation Model Access Models
Access Type Examples Control Cost Privacy
Proprietary API GPT-4o, Claude, Gemini Low Pay-per-token Data sent to provider
Open-weight (download) Llama 3, Mistral, Qwen Full Hardware cost On-premises
Open-weight (API) Together.ai, Replicate Medium Pay-per-token Depends on provider
Self-hosted proprietary Azure OpenAI (some) Medium License + infra Configurable

Failure modes: Capability overhang — users deploy foundation models for tasks they're not designed for, causing subtle failures. Benchmark contamination — models may have seen test data during pre-training. Homogenization — shared failure modes propagate across all applications. Misalignment — foundation model values don't match application's specific user population needs.

Evaluating[edit]

Foundation model evaluation requires a multi-dimensional benchmark suite: MMLU (knowledge breadth), HumanEval (coding), GSM8K/MATH (mathematics), MT-Bench (instruction following), HELM (holistic), LMSYS Chatbot Arena (human preference). No single benchmark captures all capabilities. Expert practitioners run their own domain-specific evaluations on task-representative data rather than relying solely on published benchmarks.

Creating[edit]

Selecting and adapting a foundation model:

  1. Define modality and task requirements.
  2. Choose: proprietary API (fastest, most capable, privacy concerns) vs. open-weight (full control, on-premises, lower capability ceiling).
  3. Evaluate 2–3 candidate models on 100+ representative inputs from your domain.
  4. Adaptation strategy: prompting first (cheapest), then RAG (for knowledge), then fine-tuning (for style/format), then pre-training (only if truly novel domain).
  5. Monitor model provider updates — foundation models are updated without notice, potentially breaking downstream applications.