Prompt Engineering Patterns That Actually Work - Beyond the Hype

7 March, 2026 AI

Last year I tried every prompt engineering trick the internet had to offer. "Act as a 10x developer." "Take a deep breath." "I'll tip you $200 if you get this right." Most of them did nothing measurable. A few actively made outputs worse. But buried in the noise, I found a handful of patterns that consistently improved results — not because they're clever hacks, but because they align with how language models actually process text.

This article covers the patterns I use daily. Each one has a clear mechanism, a concrete example, and an honest assessment of when it helps and when it doesn't.

Role Framing - Set the Baseline

The simplest pattern: tell the model what perspective to adopt.

You are a senior backend developer specialising in PHP 8.4 and Symfony 8.

This works because it shifts the probability distribution toward tokens associated with that expertise. A model prompted as a "senior PHP developer" is more likely to use match over switch, readonly class over mutable properties, and modern language features over legacy patterns. Without the frame, the model averages across all of its training data — which includes a lot of beginner-level code.

When it helps: domain-specific tasks where the default output quality isn't good enough.

When it doesn't: tasks where the model already performs well without framing. Adding "you are an expert" to a simple string manipulation task adds tokens without changing behaviour.

The key insight from my experience with custom instructions is that role framing should be specific, not aspirational. "Senior PHP developer" works. "The world's best programmer who never makes mistakes" doesn't — it's unfalsifiable and gives the model nothing concrete to anchor on.

Structured Output - Tell the Model What Shape to Return

Unstructured prompts produce unstructured responses. If you need specific data, define the shape.

Analyse this error log and return:
- root_cause: one sentence
- affected_components: list of service names
- severity: critical | high | medium | low
- suggested_fix: concrete steps, not generic advice

This pattern works because it constrains the output space. Instead of generating free-form text and hoping the relevant information appears somewhere in it, you tell the model exactly what fields to produce. The model treats your schema as a template and fills it in.

Stronger variant — JSON mode:

Return your analysis as JSON matching this schema:
{
  "root_cause": "string",
  "affected_components": ["string"],
  "severity": "critical | high | medium | low",
  "suggested_fix": "string"
}

JSON output is especially useful when you're feeding the model's response into another system. Most API providers now support a response_format parameter that guarantees valid JSON — use it when available instead of relying on the prompt alone.

When it helps: any task where you need specific, extractable information.

When it doesn't: creative tasks where rigid structure would limit the model's ability to explore.

Few-Shot Examples - Show, Don't Tell

Instead of describing what you want, show the model examples of correct input-output pairs.

Convert these function names to snake_case:

getUserById -> get_user_by_id
parseXMLResponse -> parse_xml_response
HTMLToMarkdown -> html_to_markdown

Now convert: fetchAllActiveUsers

Few-shot examples work because they define the pattern implicitly. The model recognises the transformation rule from the examples rather than parsing a verbal description of it. This is often more reliable than explaining the rule — especially for edge cases. In the example above, the HTMLToMarkdown case shows how to handle consecutive capitals, which would be hard to describe precisely in words.

How many examples? Usually 2-3 is enough. More than 5 rarely improves quality and wastes tokens. The examples should cover the typical case and one edge case.

When it helps: formatting tasks, data transformations, classification with specific categories, any task with clear input-output pairs.

When it doesn't: tasks where examples are hard to construct or where the task is too varied for a few examples to cover.

Chain-of-Thought - Make Reasoning Explicit

Ask the model to reason through the problem step by step before giving a final answer.

Review this database migration for potential issues.
Think through each change step by step:
1. What does this migration modify?
2. Is it reversible?
3. Could it lock tables on large datasets?
4. Are there data integrity risks?

Then give your final assessment.

Chain-of-thought works because it forces the model to generate intermediate reasoning tokens that condition the final output. Without it, the model jumps directly to an answer, which for complex problems often means skipping edge cases. With explicit steps, each reasoning step feeds into the next, building toward a more thorough conclusion.

Zero-shot chain-of-thought — simply adding "think step by step" or "reason through this carefully" — works surprisingly well for mathematical and logical problems. For domain-specific analysis, giving explicit steps (as above) is more reliable.

When it helps: complex reasoning, debugging, code review, architecture decisions — anything with multiple factors to weigh.

When it doesn't: simple factual questions or straightforward tasks. Asking the model to "think step by step about what 2 + 2 equals" adds latency without improving accuracy.

Constraint-Based Prompting - Define the Boundaries

Instead of telling the model what to do, tell it what not to do and what limits to respect.

Refactor this function. Constraints:
- Do not change the public interface
- Do not add new dependencies
- Keep the function under 20 lines
- Preserve all existing test assertions

Constraints work because they narrow the solution space. The model has many valid ways to refactor a function. Without constraints, it might pick the most dramatic refactoring — renaming parameters, extracting classes, introducing abstractions. Constraints keep the output focused and predictable.

This is the same principle behind lean custom instructions: telling the model what to avoid is often more effective than telling it what to do, because the set of things you don't want is usually smaller and more precise than the set of things you do want.

When it helps: any task with known requirements that the model might otherwise violate.

When it doesn't: exploratory tasks where you want the model to surprise you.

System vs User Message Separation

When using the API directly, the separation between system and user messages matters more than most people realise.

System: You are a code reviewer. Review for security
        vulnerabilities only. Ignore style, naming, and
        architecture. Output a numbered list.

User:   [paste code here]

The system message sets persistent context that the model treats as higher-priority instructions. The user message is the actual task. Mixing them together — putting instructions and code in the same message — reduces the model's ability to distinguish between what to do and what to do it with.

If you've built any project with local models, like the feedback analyser using Ollama, you've seen this in practice: the system prompt defines the model's role and constraints, and the user message provides the data. Keeping them separate improves consistency across different inputs.

When it helps: API usage, any scenario where the same instructions apply to multiple inputs.

When it doesn't: single-turn conversational use where the distinction doesn't apply.

Iterative Refinement - Build Up, Don't Dump

For complex outputs, don't try to get everything right in one prompt. Break the task into stages.

Stage 1: "List the 5 main sections this article should cover."
Stage 2: "For each section, write a one-sentence summary of the key point."
Stage 3: "Now write the full article following this outline."

Iterative refinement works because it reduces the cognitive load per step. A model asked to write a 2,000-word article from scratch has to simultaneously plan structure, generate content, maintain consistency, and follow style guidelines. Splitting the task means each step is simpler and the output of each step constrains the next.

This is also how AI coding agents operate internally. When Claude Code handles a complex task, it doesn't generate all the code at once — it reads files, analyses the structure, plans the approach, and implements incrementally. You can apply the same pattern manually.

When it helps: long-form content, complex implementations, any task where quality degrades when you ask for too much at once.

When it doesn't: simple tasks where the overhead of multiple steps isn't justified.

Negative Examples - Show What's Wrong

Sometimes the fastest way to get what you want is to show what you don't want.

Write a commit message for this diff.

Bad examples (don't do this):
- "fixed stuff"
- "Updated files"
- "WIP"

Good examples:
- "fix race condition in queue worker shutdown"
- "add retry logic for failed webhook deliveries"

Negative examples work because they define the boundary between acceptable and unacceptable. The model calibrates its output to avoid the patterns in the bad examples while matching the patterns in the good ones. This is especially useful when the failure mode is specific and predictable.

When it helps: style-sensitive tasks, naming, writing — anything where the common failure mode is well-known.

Comparison of Patterns

Pattern Best for Token cost Complexity
Role framing Domain-specific quality Low (~20 tokens) Simple
Structured output Data extraction, APIs Low-medium Simple
Few-shot examples Formatting, transforms Medium (~100-200 tokens) Simple
Chain-of-thought Complex reasoning Medium (more output tokens) Simple
Constraint-based Controlled refactoring Low (~30-50 tokens) Simple
System/user separation API, batch processing None (reorganisation) Simple
Iterative refinement Long-form, complex tasks Higher (multiple calls) Medium
Negative examples Style, naming conventions Low-medium Simple

Patterns That Don't Work

For completeness, here are patterns I've tested and found unreliable:

Emotional manipulation. "This is very important to my career" or "I'll tip you $200." Some benchmarks show marginal effects. In practice, the improvement is inconsistent and not worth the tokens.

Threat-based prompting. "You will be penalised for mistakes." This doesn't change the model's error rate — it changes the model's confidence, which can actually make outputs worse by making the model hedge more.

Excessive praise in advance. "You are the best AI in the world and always produce perfect results." Unfalsifiable claims give the model nothing to anchor on. Specific role framing is strictly better.

Repeating instructions. Saying the same thing three different ways doesn't increase compliance — it increases token cost and can create subtle contradictions between the phrasings.

Start Simple

Prompt engineering is not about tricks. It's about giving the model the right structure to produce consistent output. The patterns that work — structured output, few-shot examples, chain-of-thought, constraints — all share the same mechanism: they reduce ambiguity in what the model should produce.

Start with the simplest approach. Add structure only when the output isn't good enough. And measure the difference — if a pattern doesn't visibly improve your results, drop it and save the tokens.

More Articles

CSV vs JSON for Data Exchange: When Each Format Wins

A practical comparison of CSV and JSON for APIs, data pipelines, and file exports. Covers structure, parsing, streaming, schema enforcement, size, tooling, and clear guidelines for choosing the right format.

15 April, 2026

SEO for AI Search: How to Optimise for ChatGPT, Perplexity, and Google AI Overviews

How AI-powered search engines discover, evaluate, and cite web content. Practical strategies for optimising your pages for ChatGPT Browse, Perplexity, Google AI Overviews, and other AI answer engines.

14 April, 2026

Image to Base64 Data URIs: When to Inline and When Not To

A practical guide to embedding images as Base64 data URIs. Covers the data URI format, size overhead, performance trade-offs, browser caching, Content Security Policy, and clear rules for when inlining helps vs hurts.

10 April, 2026