Prompt Engineering: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
New article: Prompt Engineering structured through Bloom's Taxonomy
BloomWiki: Prompt Engineering
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
<div style="background-color: #4B0082; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
{{BloomIntro}}
{{BloomIntro}}
Prompt engineering is the practice of designing, refining, and optimizing the inputs given to large language models (LLMs) to elicit desired outputs reliably and efficiently. Because LLMs are extraordinarily sensitive to how questions are framed, the prompt is effectively the user interface between human intent and model capability. A well-crafted prompt can unlock reasoning abilities, enforce output formats, dramatically reduce hallucinations, and transform a mediocre response into an expert-level one — without changing the model at all. Prompt engineering is part art, part science, and increasingly a critical professional skill.
Prompt engineering is the practice of designing, refining, and optimizing the inputs given to large language models (LLMs) to elicit desired outputs reliably and efficiently. Because LLMs are extraordinarily sensitive to how questions are framed, the prompt is effectively the user interface between human intent and model capability. A well-crafted prompt can unlock reasoning abilities, enforce output formats, dramatically reduce hallucinations, and transform a mediocre response into an expert-level one — without changing the model at all. Prompt engineering is part art, part science, and increasingly a critical professional skill.
</div>


== Remembering ==
__TOC__
 
<div style="background-color: #000080; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Remembering</span> ==
* '''Prompt''' — The text input provided to an LLM; the complete context from which the model generates a response.
* '''Prompt''' — The text input provided to an LLM; the complete context from which the model generates a response.
* '''System prompt''' — A special instruction block (usually at the start of a conversation) that defines the model's role, persona, constraints, and output expectations.
* '''System prompt''' — A special instruction block (usually at the start of a conversation) that defines the model's role, persona, constraints, and output expectations.
Line 16: Line 21:
* '''Prompt injection''' — An attack where malicious input in the prompt attempts to override the model's original instructions.
* '''Prompt injection''' — An attack where malicious input in the prompt attempts to override the model's original instructions.
* '''Hallucination''' — The model generating plausible-sounding but factually incorrect content; prompt techniques can mitigate but not fully eliminate this.
* '''Hallucination''' — The model generating plausible-sounding but factually incorrect content; prompt techniques can mitigate but not fully eliminate this.
* '''Delimiters''' — Characters or tags (e.g., ```triple backticks```, <tags>, ---) used to clearly separate sections of a prompt.
* '''Delimiters''' — Characters or tags (e.g., <code>triple backticks</code>, <tags>, ---) used to clearly separate sections of a prompt.
* '''ReAct''' — A prompting framework combining reasoning traces ("Thought:") and tool calls ("Action:") for agent-like behavior.
* '''ReAct''' — A prompting framework combining reasoning traces ("Thought:") and tool calls ("Action:") for agent-like behavior.
* '''Constitutional prompting''' — Including explicit rules or principles in the prompt to constrain the model's behavior.
* '''Constitutional prompting''' — Including explicit rules or principles in the prompt to constrain the model's behavior.
</div>


== Understanding ==
<div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Understanding</span> ==
LLMs are next-token predictors — they complete text in a manner consistent with their training distribution. A prompt is not a command but a '''context that constrains the probability distribution''' over possible continuations. When you write a clear, structured prompt, you are steering the model toward the region of its output space that contains useful, accurate responses.
LLMs are next-token predictors — they complete text in a manner consistent with their training distribution. A prompt is not a command but a '''context that constrains the probability distribution''' over possible continuations. When you write a clear, structured prompt, you are steering the model toward the region of its output space that contains useful, accurate responses.


Line 38: Line 45:


The more precisely each of these components is specified, the less the model must infer — and inference is where errors enter.
The more precisely each of these components is specified, the less the model must infer — and inference is where errors enter.
</div>


== Applying ==
<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Applying</span> ==
'''Prompt templates for common use cases:'''
'''Prompt templates for common use cases:'''


Line 104: Line 113:
: '''Generated Knowledge''' → Ask model to generate relevant facts first, then answer
: '''Generated Knowledge''' → Ask model to generate relevant facts first, then answer
: '''Directional Stimulus''' → Include a hint or keyword that steers output direction
: '''Directional Stimulus''' → Include a hint or keyword that steers output direction
</div>


== Analyzing ==
<div style="background-color: #8B4500; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Analyzing</span> ==
{| class="wikitable"
{| class="wikitable"
|+ Prompting Technique Comparison
|+ Prompting Technique Comparison
Line 129: Line 140:
* '''Sycophancy through examples''' — If few-shot examples are biased toward one answer type, the model will mirror that bias rather than reasoning independently.
* '''Sycophancy through examples''' — If few-shot examples are biased toward one answer type, the model will mirror that bias rather than reasoning independently.
* '''Token overflow''' — Stuffing too much context causes the model to lose track of early instructions (the "lost in the middle" problem). Put critical instructions at the beginning and end of the prompt.
* '''Token overflow''' — Stuffing too much context causes the model to lose track of early instructions (the "lost in the middle" problem). Put critical instructions at the beginning and end of the prompt.
</div>


== Evaluating ==
<div style="background-color: #483D8B; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Evaluating</span> ==
Expert prompt engineers evaluate prompts systematically, not by feel:
Expert prompt engineers evaluate prompts systematically, not by feel:


Line 142: Line 155:


Expert practitioners maintain a '''prompt changelog''' — every production prompt change is documented with before/after performance metrics, the rationale for the change, and rollback instructions.
Expert practitioners maintain a '''prompt changelog''' — every production prompt change is documented with before/after performance metrics, the rationale for the change, and rollback instructions.
</div>


== Creating ==
<div style="background-color: #2F4F4F; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
== <span style="color: #FFFFFF;">Creating</span> ==
Designing a robust prompt engineering workflow:
Designing a robust prompt engineering workflow:


Line 205: Line 220:
[[Category:Large Language Models]]
[[Category:Large Language Models]]
[[Category:Prompt Engineering]]
[[Category:Prompt Engineering]]
</div>

Latest revision as of 01:56, 25 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Prompt engineering is the practice of designing, refining, and optimizing the inputs given to large language models (LLMs) to elicit desired outputs reliably and efficiently. Because LLMs are extraordinarily sensitive to how questions are framed, the prompt is effectively the user interface between human intent and model capability. A well-crafted prompt can unlock reasoning abilities, enforce output formats, dramatically reduce hallucinations, and transform a mediocre response into an expert-level one — without changing the model at all. Prompt engineering is part art, part science, and increasingly a critical professional skill.

Remembering[edit]

  • Prompt — The text input provided to an LLM; the complete context from which the model generates a response.
  • System prompt — A special instruction block (usually at the start of a conversation) that defines the model's role, persona, constraints, and output expectations.
  • User prompt — The specific query or request provided by the human user at inference time.
  • Few-shot prompting — Including a small number of input-output examples in the prompt to demonstrate the desired response format or reasoning style.
  • Zero-shot prompting — Asking the model to perform a task without providing any examples, relying solely on its pre-trained knowledge.
  • Chain-of-thought (CoT) — A prompting technique that asks the model to reason step by step before giving a final answer, improving accuracy on complex tasks.
  • Self-consistency — Running the same prompt multiple times with high temperature, then taking a majority vote over the outputs to improve reliability.
  • Role prompting — Assigning the model a persona or expert role ("You are a senior security engineer…") to anchor its reasoning style and vocabulary.
  • Output format instruction — Explicitly specifying the structure of the desired output (JSON, markdown table, numbered list, etc.).
  • Temperature — A sampling parameter controlling response randomness. Lower (0–0.3) for factual/deterministic tasks; higher (0.7–1.0) for creative tasks.
  • Context window — The total number of tokens the model can process; prompts + conversation history must fit within this limit.
  • Prompt injection — An attack where malicious input in the prompt attempts to override the model's original instructions.
  • Hallucination — The model generating plausible-sounding but factually incorrect content; prompt techniques can mitigate but not fully eliminate this.
  • Delimiters — Characters or tags (e.g., triple backticks, <tags>, ---) used to clearly separate sections of a prompt.
  • ReAct — A prompting framework combining reasoning traces ("Thought:") and tool calls ("Action:") for agent-like behavior.
  • Constitutional prompting — Including explicit rules or principles in the prompt to constrain the model's behavior.

Understanding[edit]

LLMs are next-token predictors — they complete text in a manner consistent with their training distribution. A prompt is not a command but a context that constrains the probability distribution over possible continuations. When you write a clear, structured prompt, you are steering the model toward the region of its output space that contains useful, accurate responses.

Why prompts matter so much: The model's behavior is entirely determined by its weights plus its input context. Since you can't change the weights at inference time (without fine-tuning), the prompt is your only lever. Small changes — adding "think step by step," restructuring information, or providing a clear role — can swing response quality dramatically.

Chain-of-thought works because LLMs trained on vast amounts of human-written text have learned that reasoning traces precede correct conclusions in textbooks, solutions, and technical writing. By prompting "let's think step by step," you nudge the model into the portion of its output distribution where reasoning traces are followed by correct answers.

The anatomy of an effective prompt: <syntaxhighlight lang="text"> [System/Role] → Who the model is and what constraints it operates under [Context/Input] → Background information, document, or data relevant to the task [Task/Instruction]→ What exactly to do, stated clearly and unambiguously [Format] → What the output should look like (JSON? Numbered list? Table?) [Examples] → (Optional) 1–5 input-output demonstrations [Output Cue] → A partial beginning of the expected response to prime generation </syntaxhighlight>

The more precisely each of these components is specified, the less the model must infer — and inference is where errors enter.

Applying[edit]

Prompt templates for common use cases:

Zero-shot classification

<syntaxhighlight lang="python"> prompt = """Classify the sentiment of the following customer review. Return ONLY one of: POSITIVE, NEGATIVE, NEUTRAL.

Review: "{review_text}"

Sentiment:""" </syntaxhighlight>

Chain-of-thought for math/reasoning

<syntaxhighlight lang="python"> prompt = """Solve this problem step by step. Show your reasoning clearly. After working through it, state your final answer on a new line beginning with "Answer:".

Problem: A train travels at 60 mph for 2 hours, then at 90 mph for 1.5 hours. What is the total distance traveled?"""

  1. Model will reason: "60×2=120, 90×1.5=135, total=255 miles" → Answer: 255 miles

</syntaxhighlight>

Structured output (JSON)

<syntaxhighlight lang="python"> prompt = """Extract the following information from the job posting below. Return ONLY valid JSON with exactly these fields: {

 "job_title": string,
 "company": string,
 "location": string,
 "salary_range": string or null,
 "required_skills": [list of strings],
 "experience_years": number or null

}

Job Posting: --- {job_posting_text} ---

JSON:""" </syntaxhighlight>

Few-shot prompting

<syntaxhighlight lang="python"> prompt = """Convert informal English to formal business English.

Informal: "Hey, just wanted to check in on that report thing" Formal: "I am writing to inquire about the status of the report."

Informal: "Can you send that over ASAP?" Formal: "Could you please expedite the delivery of the requested document?"

Informal: "{user_input}" Formal:""" </syntaxhighlight>

Key prompt engineering techniques reference
Chain-of-thought → "Think step by step" / "Let's reason through this"
Self-consistency → Sample 5–10 times at temp=0.7, take majority answer
Tree of Thoughts → Explore multiple reasoning branches, evaluate, backtrack
Least-to-most → Break complex task into sub-tasks, solve sequentially
Generated Knowledge → Ask model to generate relevant facts first, then answer
Directional Stimulus → Include a hint or keyword that steers output direction

Analyzing[edit]

Prompting Technique Comparison
Technique Best For Limitation
Zero-shot Simple, well-defined tasks Fails on novel or complex reasoning
Few-shot Format enforcement, stylistic tasks Uses context tokens; examples must be high quality
Chain-of-thought Math, logic, multi-step reasoning Longer responses; still can reason incorrectly
Self-consistency High-stakes factual questions Expensive (multiple API calls)
Role prompting Tone, domain vocabulary, expertise level Model may not fully "become" the role
Tree of Thoughts Open-ended problems requiring search Very expensive; complex to implement

Common failure modes:

  • Ambiguous instructions — "Summarize this" without specifying length, audience, or focus produces wildly variable outputs. Be explicit.
  • Prompt injection — If user input is inserted into a prompt without sanitization, attackers can override system instructions ("Ignore all previous instructions and…").
  • Over-constraining — Too many rules in the system prompt create conflicts; the model satisfies some at the expense of others. Prioritize ruthlessly.
  • Sycophancy through examples — If few-shot examples are biased toward one answer type, the model will mirror that bias rather than reasoning independently.
  • Token overflow — Stuffing too much context causes the model to lose track of early instructions (the "lost in the middle" problem). Put critical instructions at the beginning and end of the prompt.

Evaluating[edit]

Expert prompt engineers evaluate prompts systematically, not by feel:

Regression test suites: A set of 50–200 test prompts with known expected outputs. Every prompt change is run against this suite before deployment. Track pass rate, not just spot-check a few examples.

Prompt sensitivity analysis: Vary minor aspects of the prompt (different wording, different delimiter style, different example order) and measure output variance. Robust prompts should produce similar quality across these variations.

Adversarial testing: Deliberately try to break the prompt — inject contradictory instructions, provide edge-case inputs, attempt prompt injection. Identify and patch vulnerabilities before deployment.

A/B testing in production: Compare two prompt variants by routing a random fraction of traffic to each, measuring downstream business metrics (task success rate, user satisfaction, hallucination rate).

Expert practitioners maintain a prompt changelog — every production prompt change is documented with before/after performance metrics, the rationale for the change, and rollback instructions.

Creating[edit]

Designing a robust prompt engineering workflow:

1. Prompt development methodology <syntaxhighlight lang="text"> Define task precisely: what input → what output?

Write initial zero-shot prompt

Evaluate on 20+ diverse examples

Identify failure modes → add constraints, examples, CoT

[Iterate: specify → test → analyze → refine]

Adversarial testing: injection, edge cases

Regression suite: 100+ examples with expected outputs

A/B test in production

Monitor and maintain </syntaxhighlight>

2. System prompt template for a production assistant <syntaxhighlight lang="text">

  1. Role

You are [NAME], a [DOMAIN] expert assistant for [COMPANY/APP].

  1. Capabilities

You can: [list what the model SHOULD do]

  1. Constraints

You must NOT: [list prohibited behaviors] If asked about [OFF-TOPIC], respond: "[standard deflection]"

  1. Output Format

Always respond in [FORMAT]. Structure your response as: 1. [FIELD 1] 2. [FIELD 2]

  1. Tone

[Formal/Casual/Technical]. Avoid [specific patterns to avoid].

  1. Grounding

Base your responses only on [CONTEXT SOURCES]. If unsure, say "I don't know." </syntaxhighlight>

3. Meta-prompting for prompt generation

  • Ask a capable model to generate prompt variants for your task
  • Use the model to critique and improve your draft prompt
  • Use "prompt optimizers" (DSPy, PromptBreeder) that automatically search the prompt space

4. Prompt management infrastructure

  • Store prompts in version control (Git) alongside code
  • Use prompt management platforms (LangSmith, Promptfoo, Langfuse)
  • Keep prompts in configuration files, not hardcoded in source
  • Implement prompt A/B testing with statistical significance tracking