Prompt Engineering: Difference between revisions

From BloomWiki
Jump to navigation Jump to search
Add BloomIntro banner
New article: Prompt Engineering structured through Bloom's Taxonomy
Line 1: Line 1:
{{BloomIntro}}
{{BloomIntro}}
{{Bloom Article}}
Prompt engineering is the practice of designing, refining, and optimizing the inputs given to large language models (LLMs) to elicit desired outputs reliably and efficiently. Because LLMs are extraordinarily sensitive to how questions are framed, the prompt is effectively the user interface between human intent and model capability. A well-crafted prompt can unlock reasoning abilities, enforce output formats, dramatically reduce hallucinations, and transform a mediocre response into an expert-level one — without changing the model at all. Prompt engineering is part art, part science, and increasingly a critical professional skill.
Prompt engineering is the practice of crafting inputs to AI language models in order to elicit accurate, useful, and reliable outputs. As large language models (LLMs) become embedded in software, research, and creative workflows, knowing how to communicate with them effectively has become a foundational skill.


== Remembering ==
== Remembering ==
A '''prompt''' is any text (or structured input) given to a language model to direct its response. Key terms:
* '''Prompt''' — The text input provided to an LLM; the complete context from which the model generates a response.
* '''System prompt''' — A special instruction block (usually at the start of a conversation) that defines the model's role, persona, constraints, and output expectations.
* '''User prompt''' — The specific query or request provided by the human user at inference time.
* '''Few-shot prompting''' — Including a small number of input-output examples in the prompt to demonstrate the desired response format or reasoning style.
* '''Zero-shot prompting''' — Asking the model to perform a task without providing any examples, relying solely on its pre-trained knowledge.
* '''Chain-of-thought (CoT)''' — A prompting technique that asks the model to reason step by step before giving a final answer, improving accuracy on complex tasks.
* '''Self-consistency''' — Running the same prompt multiple times with high temperature, then taking a majority vote over the outputs to improve reliability.
* '''Role prompting''' — Assigning the model a persona or expert role ("You are a senior security engineer…") to anchor its reasoning style and vocabulary.
* '''Output format instruction''' — Explicitly specifying the structure of the desired output (JSON, markdown table, numbered list, etc.).
* '''Temperature''' — A sampling parameter controlling response randomness. Lower (0–0.3) for factual/deterministic tasks; higher (0.7–1.0) for creative tasks.
* '''Context window''' — The total number of tokens the model can process; prompts + conversation history must fit within this limit.
* '''Prompt injection''' — An attack where malicious input in the prompt attempts to override the model's original instructions.
* '''Hallucination''' — The model generating plausible-sounding but factually incorrect content; prompt techniques can mitigate but not fully eliminate this.
* '''Delimiters''' — Characters or tags (e.g., ```triple backticks```, <tags>, ---) used to clearly separate sections of a prompt.
* '''ReAct''' — A prompting framework combining reasoning traces ("Thought:") and tool calls ("Action:") for agent-like behavior.
* '''Constitutional prompting''' — Including explicit rules or principles in the prompt to constrain the model's behavior.


* '''LLM''' (Large Language Model) � an AI system trained on large text corpora to predict and generate language (e.g., GPT-4, Claude, Gemini).
== Understanding ==
* '''Token''' � the unit LLMs process; roughly � of a word on average.
LLMs are next-token predictors — they complete text in a manner consistent with their training distribution. A prompt is not a command but a '''context that constrains the probability distribution''' over possible continuations. When you write a clear, structured prompt, you are steering the model toward the region of its output space that contains useful, accurate responses.
* '''System prompt''' � instructions given to the model before the user turn, often to set role or constraints.
* '''User prompt''' � the human-authored input in a conversation turn.
* '''Context window''' � the maximum number of tokens an LLM can process in one call.
* '''Temperature''' � a parameter controlling output randomness; higher = more creative, lower = more deterministic.
* '''Zero-shot''' � prompting without examples.
* '''Few-shot''' � prompting with a small number of examples embedded in the prompt.
* '''Chain-of-thought (CoT)''' � instructing the model to reason step by step before giving a final answer.


== Understanding ==
'''Why prompts matter so much''': The model's behavior is entirely determined by its weights plus its input context. Since you can't change the weights at inference time (without fine-tuning), the prompt is your only lever. Small changes — adding "think step by step," restructuring information, or providing a clear role — can swing response quality dramatically.
LLMs are next-token predictors: given a sequence of tokens, they assign probabilities to what comes next, then sample from that distribution. Prompt engineering works because the model has learned patterns from its training data � the way a prompt is framed shifts which patterns the model activates.


Several mechanisms explain why certain prompting strategies succeed:
'''Chain-of-thought''' works because LLMs trained on vast amounts of human-written text have learned that reasoning traces precede correct conclusions in textbooks, solutions, and technical writing. By prompting "let's think step by step," you nudge the model into the portion of its output distribution where reasoning traces are followed by correct answers.


* '''Priming''' � early tokens in the context bias the distribution of later tokens. Opening with "You are an expert..." shifts the model toward confident, technical language.
'''The anatomy of an effective prompt''':
* '''In-context learning''' � few-shot examples effectively demonstrate the desired input-output format without updating model weights.
<syntaxhighlight lang="text">
* '''Chain-of-thought reasoning''' � breaking complex tasks into explicit steps reduces the chance of the model "shortcutting" to a plausible but wrong answer.
[System/Role]    → Who the model is and what constraints it operates under
* '''Role assignment''' � setting a persona frames which knowledge and tone the model draws on.
[Context/Input]  → Background information, document, or data relevant to the task
[Task/Instruction]→ What exactly to do, stated clearly and unambiguously
[Format]          → What the output should look like (JSON? Numbered list? Table?)
[Examples]        → (Optional) 1–5 input-output demonstrations
[Output Cue]      → A partial beginning of the expected response to prime generation
</syntaxhighlight>


The model does not "understand" instructions the way humans do; it pattern-matches. This means precise, unambiguous language outperforms vague or colloquial phrasing.
The more precisely each of these components is specified, the less the model must infer — and inference is where errors enter.


== Applying ==
== Applying ==
Common prompt patterns used in practice:
'''Prompt templates for common use cases:'''
 
; Zero-shot classification
<syntaxhighlight lang="python">
prompt = """Classify the sentiment of the following customer review.
Return ONLY one of: POSITIVE, NEGATIVE, NEUTRAL.
 
Review: "{review_text}"
 
Sentiment:"""
</syntaxhighlight>
 
; Chain-of-thought for math/reasoning
<syntaxhighlight lang="python">
prompt = """Solve this problem step by step. Show your reasoning clearly.
After working through it, state your final answer on a new line beginning with "Answer:".
 
Problem: A train travels at 60 mph for 2 hours, then at 90 mph for 1.5 hours.
What is the total distance traveled?"""
# Model will reason: "60×2=120, 90×1.5=135, total=255 miles" → Answer: 255 miles
</syntaxhighlight>
 
; Structured output (JSON)
<syntaxhighlight lang="python">
prompt = """Extract the following information from the job posting below.
Return ONLY valid JSON with exactly these fields:
{
  "job_title": string,
  "company": string,
  "location": string,
  "salary_range": string or null,
  "required_skills": [list of strings],
  "experience_years": number or null
}


; Instruction + context + format
Job Posting:
: State what you want, give necessary background, and specify the output structure. Example: "Summarize the following contract clause in plain English, using three bullet points, each under 20 words: [clause text]"
---
{job_posting_text}
---


; Role prompting
JSON:"""
: "You are a senior tax attorney. Review the following scenario and identify compliance risks."
</syntaxhighlight>


; Few-shot with worked examples
; Few-shot prompting
: Provide 2�5 input/output pairs before the real query so the model learns the desired format implicitly.
<syntaxhighlight lang="python">
prompt = """Convert informal English to formal business English.


; Chain-of-thought
Informal: "Hey, just wanted to check in on that report thing"
: Append "Think step by step" or provide a reasoning skeleton the model fills in.
Formal: "I am writing to inquire about the status of the report."


; Self-consistency
Informal: "Can you send that over ASAP?"
: Generate multiple responses at higher temperature, then pick the answer that appears most often � useful for math and logic problems.
Formal: "Could you please expedite the delivery of the requested document?"


; Retrieval-augmented prompting
Informal: "{user_input}"
: Insert retrieved documents or database results into the prompt so the model grounds its answer in current or private data rather than training memory.
Formal:"""
</syntaxhighlight>


Practical workflow: start with the simplest prompt that could work, test it on diverse inputs, iterate on failures, then lock the working version as a template.
; Key prompt engineering techniques reference
: '''Chain-of-thought''' → "Think step by step" / "Let's reason through this"
: '''Self-consistency''' → Sample 5–10 times at temp=0.7, take majority answer
: '''Tree of Thoughts''' → Explore multiple reasoning branches, evaluate, backtrack
: '''Least-to-most''' → Break complex task into sub-tasks, solve sequentially
: '''Generated Knowledge''' → Ask model to generate relevant facts first, then answer
: '''Directional Stimulus''' → Include a hint or keyword that steers output direction


== Analyzing ==
== Analyzing ==
Prompt quality depends on several interacting factors:
{| class="wikitable"
{| class="wikitable"
|+ Prompting Technique Comparison
! Technique !! Best For !! Limitation
|-
|-
! Factor !! Effect when optimized !! Common failure mode
| Zero-shot || Simple, well-defined tasks || Fails on novel or complex reasoning
|-
|-
| Specificity || Model knows exactly what to produce || Underspecified prompts produce generic, hedged answers
| Few-shot || Format enforcement, stylistic tasks || Uses context tokens; examples must be high quality
|-
|-
| Ordering || Most important instructions early or at the very end (primacy/recency effects) || Critical constraints buried in the middle are ignored
| Chain-of-thought || Math, logic, multi-step reasoning || Longer responses; still can reason incorrectly
|-
|-
| Length || Enough context for accuracy || Overly long prompts dilute attention; key instructions get lost
| Self-consistency || High-stakes factual questions || Expensive (multiple API calls)
|-
|-
| Examples || Anchor format and tone || Unrepresentative examples mislead the model about edge cases
| Role prompting || Tone, domain vocabulary, expertise level || Model may not fully "become" the role
|-
|-
| Constraints || Reduce unwanted outputs || Contradictory constraints cause unpredictable behavior
| Tree of Thoughts || Open-ended problems requiring search || Very expensive; complex to implement
|}
|}


A key limitation: prompts are not programs. The same prompt can yield different outputs across model versions, temperatures, or even repeated calls at the same temperature. Robust systems treat prompts as probabilistic policies, not deterministic functions, and test them statistically across a sample of inputs.
'''Common failure modes:'''
* '''Ambiguous instructions''' — "Summarize this" without specifying length, audience, or focus produces wildly variable outputs. Be explicit.
* '''Prompt injection''' — If user input is inserted into a prompt without sanitization, attackers can override system instructions ("Ignore all previous instructions and…").
* '''Over-constraining''' — Too many rules in the system prompt create conflicts; the model satisfies some at the expense of others. Prioritize ruthlessly.
* '''Sycophancy through examples''' — If few-shot examples are biased toward one answer type, the model will mirror that bias rather than reasoning independently.
* '''Token overflow''' — Stuffing too much context causes the model to lose track of early instructions (the "lost in the middle" problem). Put critical instructions at the beginning and end of the prompt.


Another limitation is '''context window pressure''': as prompts and conversation history grow, older content falls outside the context or receives less attention, degrading instruction-following.
== Evaluating ==
Expert prompt engineers evaluate prompts systematically, not by feel:


== Evaluating ==
'''Regression test suites''': A set of 50–200 test prompts with known expected outputs. Every prompt change is run against this suite before deployment. Track pass rate, not just spot-check a few examples.
Expert prompt engineers judge prompts against three criteria simultaneously:


# '''Accuracy''' � does the output answer the actual question correctly?
'''Prompt sensitivity analysis''': Vary minor aspects of the prompt (different wording, different delimiter style, different example order) and measure output variance. Robust prompts should produce similar quality across these variations.
# '''Reliability''' � does it do so consistently across varied phrasings and edge cases?
# '''Efficiency''' � does it achieve this with minimal tokens (cost) and latency?


Advanced strategies:
'''Adversarial testing''': Deliberately try to break the prompt — inject contradictory instructions, provide edge-case inputs, attempt prompt injection. Identify and patch vulnerabilities before deployment.


* '''Prompt versioning''' � treat prompts like code: store them in version control, log which version produced which output, and roll back on regressions.
'''A/B testing in production''': Compare two prompt variants by routing a random fraction of traffic to each, measuring downstream business metrics (task success rate, user satisfaction, hallucination rate).
* '''Eval-driven iteration''' � build a labeled test set of inputs and expected outputs before writing the prompt. Score every candidate prompt against the eval. This prevents overfitting a prompt to the one or two examples you happened to test by hand.
* '''Meta-prompting''' � ask the LLM to critique and rewrite its own prompt given the failure cases you observed.
* '''Constitutional prompting''' � embed a set of explicit principles the model should self-check against before finalizing its answer.
* '''Guard-railing''' � add a validation step (another LLM call or a deterministic check) that inspects the output before it reaches the user and re-prompts if it fails.


The single most common expert mistake is optimizing a prompt on the training distribution (the cases you thought of) rather than validating on held-out or adversarial inputs.
Expert practitioners maintain a '''prompt changelog''' — every production prompt change is documented with before/after performance metrics, the rationale for the change, and rollback instructions.


== Creating ==
== Creating ==
Designing a prompt system � rather than a single prompt � requires treating prompting as software architecture:
Designing a robust prompt engineering workflow:
 
'''1. Prompt development methodology'''
<syntaxhighlight lang="text">
Define task precisely: what input → what output?
    ↓
Write initial zero-shot prompt
    ↓
Evaluate on 20+ diverse examples
    ↓
Identify failure modes → add constraints, examples, CoT
    ↓
[Iterate: specify → test → analyze → refine]
    ↓
Adversarial testing: injection, edge cases
    ↓
Regression suite: 100+ examples with expected outputs
    ↓
A/B test in production
    ↓
Monitor and maintain
</syntaxhighlight>
 
'''2. System prompt template for a production assistant'''
<syntaxhighlight lang="text">
# Role
You are [NAME], a [DOMAIN] expert assistant for [COMPANY/APP].
 
# Capabilities
You can: [list what the model SHOULD do]


; Modular prompt templates
# Constraints
: Separate the static skeleton (role, format, constraints) from dynamic slots (user query, retrieved context, tool outputs). This lets you swap components independently.
You must NOT: [list prohibited behaviors]
If asked about [OFF-TOPIC], respond: "[standard deflection]"


; Prompt chaining
# Output Format
: Decompose complex tasks into a pipeline of smaller prompts whose outputs feed the next stage. Example: (1) extract key entities ? (2) retrieve relevant docs ? (3) draft answer ? (4) critique and revise.
Always respond in [FORMAT]. Structure your response as:
1. [FIELD 1]
2. [FIELD 2]


; Agentic loops
# Tone
: The model iteratively calls tools (search, code execution, databases), inspects results, and decides the next action. Prompts here must define the tool schema, the stopping condition, and how the model should handle tool errors.
[Formal/Casual/Technical]. Avoid [specific patterns to avoid].


; Evaluation harnesses
# Grounding
: A production prompt system ships with an automated eval harness � a suite of test cases, scoring functions, and a regression dashboard � so that any prompt change triggers a quality gate before deployment.
Base your responses only on [CONTEXT SOURCES]. If unsure, say "I don't know."
</syntaxhighlight>


; Cost/quality frontier
'''3. Meta-prompting for prompt generation'''
: Map the trade-off between model size/cost and output quality for your specific task. Often a smaller model with a well-engineered prompt outperforms a larger model with a naive prompt at a fraction of the cost.
* Ask a capable model to generate prompt variants for your task
* Use the model to critique and improve your draft prompt
* Use "prompt optimizers" (DSPy, PromptBreeder) that automatically search the prompt space


Designing for robustness means assuming the model will occasionally fail and building the surrounding system (retries, fallbacks, human-in-the-loop escalation) to handle that gracefully rather than expecting the prompt alone to be sufficient.
'''4. Prompt management infrastructure'''
* Store prompts in version control (Git) alongside code
* Use prompt management platforms (LangSmith, Promptfoo, Langfuse)
* Keep prompts in configuration files, not hardcoded in source
* Implement prompt A/B testing with statistical significance tracking


[[Category:Artificial Intelligence]]
[[Category:Artificial Intelligence]]
[[Category:Machine Learning]]
[[Category:Large Language Models]]
[[Category:Productivity]]
[[Category:Prompt Engineering]]

Revision as of 06:27, 23 April 2026

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Prompt engineering is the practice of designing, refining, and optimizing the inputs given to large language models (LLMs) to elicit desired outputs reliably and efficiently. Because LLMs are extraordinarily sensitive to how questions are framed, the prompt is effectively the user interface between human intent and model capability. A well-crafted prompt can unlock reasoning abilities, enforce output formats, dramatically reduce hallucinations, and transform a mediocre response into an expert-level one — without changing the model at all. Prompt engineering is part art, part science, and increasingly a critical professional skill.

Remembering

  • Prompt — The text input provided to an LLM; the complete context from which the model generates a response.
  • System prompt — A special instruction block (usually at the start of a conversation) that defines the model's role, persona, constraints, and output expectations.
  • User prompt — The specific query or request provided by the human user at inference time.
  • Few-shot prompting — Including a small number of input-output examples in the prompt to demonstrate the desired response format or reasoning style.
  • Zero-shot prompting — Asking the model to perform a task without providing any examples, relying solely on its pre-trained knowledge.
  • Chain-of-thought (CoT) — A prompting technique that asks the model to reason step by step before giving a final answer, improving accuracy on complex tasks.
  • Self-consistency — Running the same prompt multiple times with high temperature, then taking a majority vote over the outputs to improve reliability.
  • Role prompting — Assigning the model a persona or expert role ("You are a senior security engineer…") to anchor its reasoning style and vocabulary.
  • Output format instruction — Explicitly specifying the structure of the desired output (JSON, markdown table, numbered list, etc.).
  • Temperature — A sampling parameter controlling response randomness. Lower (0–0.3) for factual/deterministic tasks; higher (0.7–1.0) for creative tasks.
  • Context window — The total number of tokens the model can process; prompts + conversation history must fit within this limit.
  • Prompt injection — An attack where malicious input in the prompt attempts to override the model's original instructions.
  • Hallucination — The model generating plausible-sounding but factually incorrect content; prompt techniques can mitigate but not fully eliminate this.
  • Delimiters — Characters or tags (e.g., ```triple backticks```, <tags>, ---) used to clearly separate sections of a prompt.
  • ReAct — A prompting framework combining reasoning traces ("Thought:") and tool calls ("Action:") for agent-like behavior.
  • Constitutional prompting — Including explicit rules or principles in the prompt to constrain the model's behavior.

Understanding

LLMs are next-token predictors — they complete text in a manner consistent with their training distribution. A prompt is not a command but a context that constrains the probability distribution over possible continuations. When you write a clear, structured prompt, you are steering the model toward the region of its output space that contains useful, accurate responses.

Why prompts matter so much: The model's behavior is entirely determined by its weights plus its input context. Since you can't change the weights at inference time (without fine-tuning), the prompt is your only lever. Small changes — adding "think step by step," restructuring information, or providing a clear role — can swing response quality dramatically.

Chain-of-thought works because LLMs trained on vast amounts of human-written text have learned that reasoning traces precede correct conclusions in textbooks, solutions, and technical writing. By prompting "let's think step by step," you nudge the model into the portion of its output distribution where reasoning traces are followed by correct answers.

The anatomy of an effective prompt: <syntaxhighlight lang="text"> [System/Role] → Who the model is and what constraints it operates under [Context/Input] → Background information, document, or data relevant to the task [Task/Instruction]→ What exactly to do, stated clearly and unambiguously [Format] → What the output should look like (JSON? Numbered list? Table?) [Examples] → (Optional) 1–5 input-output demonstrations [Output Cue] → A partial beginning of the expected response to prime generation </syntaxhighlight>

The more precisely each of these components is specified, the less the model must infer — and inference is where errors enter.

Applying

Prompt templates for common use cases:

Zero-shot classification

<syntaxhighlight lang="python"> prompt = """Classify the sentiment of the following customer review. Return ONLY one of: POSITIVE, NEGATIVE, NEUTRAL.

Review: "{review_text}"

Sentiment:""" </syntaxhighlight>

Chain-of-thought for math/reasoning

<syntaxhighlight lang="python"> prompt = """Solve this problem step by step. Show your reasoning clearly. After working through it, state your final answer on a new line beginning with "Answer:".

Problem: A train travels at 60 mph for 2 hours, then at 90 mph for 1.5 hours. What is the total distance traveled?"""

  1. Model will reason: "60×2=120, 90×1.5=135, total=255 miles" → Answer: 255 miles

</syntaxhighlight>

Structured output (JSON)

<syntaxhighlight lang="python"> prompt = """Extract the following information from the job posting below. Return ONLY valid JSON with exactly these fields: {

 "job_title": string,
 "company": string,
 "location": string,
 "salary_range": string or null,
 "required_skills": [list of strings],
 "experience_years": number or null

}

Job Posting: --- {job_posting_text} ---

JSON:""" </syntaxhighlight>

Few-shot prompting

<syntaxhighlight lang="python"> prompt = """Convert informal English to formal business English.

Informal: "Hey, just wanted to check in on that report thing" Formal: "I am writing to inquire about the status of the report."

Informal: "Can you send that over ASAP?" Formal: "Could you please expedite the delivery of the requested document?"

Informal: "{user_input}" Formal:""" </syntaxhighlight>

Key prompt engineering techniques reference
Chain-of-thought → "Think step by step" / "Let's reason through this"
Self-consistency → Sample 5–10 times at temp=0.7, take majority answer
Tree of Thoughts → Explore multiple reasoning branches, evaluate, backtrack
Least-to-most → Break complex task into sub-tasks, solve sequentially
Generated Knowledge → Ask model to generate relevant facts first, then answer
Directional Stimulus → Include a hint or keyword that steers output direction

Analyzing

Prompting Technique Comparison
Technique Best For Limitation
Zero-shot Simple, well-defined tasks Fails on novel or complex reasoning
Few-shot Format enforcement, stylistic tasks Uses context tokens; examples must be high quality
Chain-of-thought Math, logic, multi-step reasoning Longer responses; still can reason incorrectly
Self-consistency High-stakes factual questions Expensive (multiple API calls)
Role prompting Tone, domain vocabulary, expertise level Model may not fully "become" the role
Tree of Thoughts Open-ended problems requiring search Very expensive; complex to implement

Common failure modes:

  • Ambiguous instructions — "Summarize this" without specifying length, audience, or focus produces wildly variable outputs. Be explicit.
  • Prompt injection — If user input is inserted into a prompt without sanitization, attackers can override system instructions ("Ignore all previous instructions and…").
  • Over-constraining — Too many rules in the system prompt create conflicts; the model satisfies some at the expense of others. Prioritize ruthlessly.
  • Sycophancy through examples — If few-shot examples are biased toward one answer type, the model will mirror that bias rather than reasoning independently.
  • Token overflow — Stuffing too much context causes the model to lose track of early instructions (the "lost in the middle" problem). Put critical instructions at the beginning and end of the prompt.

Evaluating

Expert prompt engineers evaluate prompts systematically, not by feel:

Regression test suites: A set of 50–200 test prompts with known expected outputs. Every prompt change is run against this suite before deployment. Track pass rate, not just spot-check a few examples.

Prompt sensitivity analysis: Vary minor aspects of the prompt (different wording, different delimiter style, different example order) and measure output variance. Robust prompts should produce similar quality across these variations.

Adversarial testing: Deliberately try to break the prompt — inject contradictory instructions, provide edge-case inputs, attempt prompt injection. Identify and patch vulnerabilities before deployment.

A/B testing in production: Compare two prompt variants by routing a random fraction of traffic to each, measuring downstream business metrics (task success rate, user satisfaction, hallucination rate).

Expert practitioners maintain a prompt changelog — every production prompt change is documented with before/after performance metrics, the rationale for the change, and rollback instructions.

Creating

Designing a robust prompt engineering workflow:

1. Prompt development methodology <syntaxhighlight lang="text"> Define task precisely: what input → what output?

Write initial zero-shot prompt

Evaluate on 20+ diverse examples

Identify failure modes → add constraints, examples, CoT

[Iterate: specify → test → analyze → refine]

Adversarial testing: injection, edge cases

Regression suite: 100+ examples with expected outputs

A/B test in production

Monitor and maintain </syntaxhighlight>

2. System prompt template for a production assistant <syntaxhighlight lang="text">

  1. Role

You are [NAME], a [DOMAIN] expert assistant for [COMPANY/APP].

  1. Capabilities

You can: [list what the model SHOULD do]

  1. Constraints

You must NOT: [list prohibited behaviors] If asked about [OFF-TOPIC], respond: "[standard deflection]"

  1. Output Format

Always respond in [FORMAT]. Structure your response as: 1. [FIELD 1] 2. [FIELD 2]

  1. Tone

[Formal/Casual/Technical]. Avoid [specific patterns to avoid].

  1. Grounding

Base your responses only on [CONTEXT SOURCES]. If unsure, say "I don't know." </syntaxhighlight>

3. Meta-prompting for prompt generation

  • Ask a capable model to generate prompt variants for your task
  • Use the model to critique and improve your draft prompt
  • Use "prompt optimizers" (DSPy, PromptBreeder) that automatically search the prompt space

4. Prompt management infrastructure

  • Store prompts in version control (Git) alongside code
  • Use prompt management platforms (LangSmith, Promptfoo, Langfuse)
  • Keep prompts in configuration files, not hardcoded in source
  • Implement prompt A/B testing with statistical significance tracking