Prompt Engineering Patterns That Actually Work - Beyond the Hype
7 March, 2026 AI
Last year I tried every prompt engineering trick the internet had to offer. "Act as a 10x developer." "Take a deep breath." "I'll tip you $200 if you get this right." Most of them did nothing measurable. A few actively made outputs worse. But buried in the noise, I found a handful of patterns that consistently improved results — not because they're clever hacks, but because they align with how language models actually process text.
This article covers the patterns I use daily. Each one has a clear mechanism, a concrete example, and an honest assessment of when it helps and when it doesn't.
Role Framing - Set the Baseline
The simplest pattern: tell the model what perspective to adopt.
You are a senior backend developer specialising in PHP 8.4 and Symfony 8.
This works because it shifts the probability distribution toward tokens associated with that expertise. A model prompted as a "senior PHP developer" is more likely to use match over switch, readonly class over mutable properties, and modern language features over legacy patterns. Without the frame, the model averages across all of its training data — which includes a lot of beginner-level code.
When it helps: domain-specific tasks where the default output quality isn't good enough.
When it doesn't: tasks where the model already performs well without framing. Adding "you are an expert" to a simple string manipulation task adds tokens without changing behaviour.
The key insight from my experience with custom instructions is that role framing should be specific, not aspirational. "Senior PHP developer" works. "The world's best programmer who never makes mistakes" doesn't — it's unfalsifiable and gives the model nothing concrete to anchor on.
Structured Output - Tell the Model What Shape to Return
Unstructured prompts produce unstructured responses. If you need specific data, define the shape.
Analyse this error log and return:
- root_cause: one sentence
- affected_components: list of service names
- severity: critical | high | medium | low
- suggested_fix: concrete steps, not generic advice
This pattern works because it constrains the output space. Instead of generating free-form text and hoping the relevant information appears somewhere in it, you tell the model exactly what fields to produce. The model treats your schema as a template and fills it in.
Stronger variant — JSON mode:
Return your analysis as JSON matching this schema:
{
"root_cause": "string",
"affected_components": ["string"],
"severity": "critical | high | medium | low",
"suggested_fix": "string"
}
JSON output is especially useful when you're feeding the model's response into another system. Most API providers now support a response_format parameter that guarantees valid JSON — use it when available instead of relying on the prompt alone.
When it helps: any task where you need specific, extractable information.
When it doesn't: creative tasks where rigid structure would limit the model's ability to explore.
Few-Shot Examples - Show, Don't Tell
Instead of describing what you want, show the model examples of correct input-output pairs.
Convert these function names to snake_case:
getUserById -> get_user_by_id
parseXMLResponse -> parse_xml_response
HTMLToMarkdown -> html_to_markdown
Now convert: fetchAllActiveUsers
Few-shot examples work because they define the pattern implicitly. The model recognises the transformation rule from the examples rather than parsing a verbal description of it. This is often more reliable than explaining the rule — especially for edge cases. In the example above, the HTMLToMarkdown case shows how to handle consecutive capitals, which would be hard to describe precisely in words.
How many examples? Usually 2-3 is enough. More than 5 rarely improves quality and wastes tokens. The examples should cover the typical case and one edge case.
When it helps: formatting tasks, data transformations, classification with specific categories, any task with clear input-output pairs.
When it doesn't: tasks where examples are hard to construct or where the task is too varied for a few examples to cover.
Chain-of-Thought - Make Reasoning Explicit
Ask the model to reason through the problem step by step before giving a final answer.
Review this database migration for potential issues.
Think through each change step by step:
1. What does this migration modify?
2. Is it reversible?
3. Could it lock tables on large datasets?
4. Are there data integrity risks?
Then give your final assessment.
Chain-of-thought works because it forces the model to generate intermediate reasoning tokens that condition the final output. Without it, the model jumps directly to an answer, which for complex problems often means skipping edge cases. With explicit steps, each reasoning step feeds into the next, building toward a more thorough conclusion.
Zero-shot chain-of-thought — simply adding "think step by step" or "reason through this carefully" — works surprisingly well for mathematical and logical problems. For domain-specific analysis, giving explicit steps (as above) is more reliable.
When it helps: complex reasoning, debugging, code review, architecture decisions — anything with multiple factors to weigh.
When it doesn't: simple factual questions or straightforward tasks. Asking the model to "think step by step about what 2 + 2 equals" adds latency without improving accuracy.
Constraint-Based Prompting - Define the Boundaries
Instead of telling the model what to do, tell it what not to do and what limits to respect.
Refactor this function. Constraints:
- Do not change the public interface
- Do not add new dependencies
- Keep the function under 20 lines
- Preserve all existing test assertions
Constraints work because they narrow the solution space. The model has many valid ways to refactor a function. Without constraints, it might pick the most dramatic refactoring — renaming parameters, extracting classes, introducing abstractions. Constraints keep the output focused and predictable.
This is the same principle behind lean custom instructions: telling the model what to avoid is often more effective than telling it what to do, because the set of things you don't want is usually smaller and more precise than the set of things you do want.
When it helps: any task with known requirements that the model might otherwise violate.
When it doesn't: exploratory tasks where you want the model to surprise you.
System vs User Message Separation
When using the API directly, the separation between system and user messages matters more than most people realise.
System: You are a code reviewer. Review for security
vulnerabilities only. Ignore style, naming, and
architecture. Output a numbered list.
User: [paste code here]
The system message sets persistent context that the model treats as higher-priority instructions. The user message is the actual task. Mixing them together — putting instructions and code in the same message — reduces the model's ability to distinguish between what to do and what to do it with.
If you've built any project with local models, like the feedback analyser using Ollama, you've seen this in practice: the system prompt defines the model's role and constraints, and the user message provides the data. Keeping them separate improves consistency across different inputs.
When it helps: API usage, any scenario where the same instructions apply to multiple inputs.
When it doesn't: single-turn conversational use where the distinction doesn't apply.
Iterative Refinement - Build Up, Don't Dump
For complex outputs, don't try to get everything right in one prompt. Break the task into stages.
Stage 1: "List the 5 main sections this article should cover."
Stage 2: "For each section, write a one-sentence summary of the key point."
Stage 3: "Now write the full article following this outline."
Iterative refinement works because it reduces the cognitive load per step. A model asked to write a 2,000-word article from scratch has to simultaneously plan structure, generate content, maintain consistency, and follow style guidelines. Splitting the task means each step is simpler and the output of each step constrains the next.
This is also how AI coding agents operate internally. When Claude Code handles a complex task, it doesn't generate all the code at once — it reads files, analyses the structure, plans the approach, and implements incrementally. You can apply the same pattern manually.
When it helps: long-form content, complex implementations, any task where quality degrades when you ask for too much at once.
When it doesn't: simple tasks where the overhead of multiple steps isn't justified.
Negative Examples - Show What's Wrong
Sometimes the fastest way to get what you want is to show what you don't want.
Write a commit message for this diff.
Bad examples (don't do this):
- "fixed stuff"
- "Updated files"
- "WIP"
Good examples:
- "fix race condition in queue worker shutdown"
- "add retry logic for failed webhook deliveries"
Negative examples work because they define the boundary between acceptable and unacceptable. The model calibrates its output to avoid the patterns in the bad examples while matching the patterns in the good ones. This is especially useful when the failure mode is specific and predictable.
When it helps: style-sensitive tasks, naming, writing — anything where the common failure mode is well-known.
Comparison of Patterns
| Pattern | Best for | Token cost | Complexity |
|---|---|---|---|
| Role framing | Domain-specific quality | Low (~20 tokens) | Simple |
| Structured output | Data extraction, APIs | Low-medium | Simple |
| Few-shot examples | Formatting, transforms | Medium (~100-200 tokens) | Simple |
| Chain-of-thought | Complex reasoning | Medium (more output tokens) | Simple |
| Constraint-based | Controlled refactoring | Low (~30-50 tokens) | Simple |
| System/user separation | API, batch processing | None (reorganisation) | Simple |
| Iterative refinement | Long-form, complex tasks | Higher (multiple calls) | Medium |
| Negative examples | Style, naming conventions | Low-medium | Simple |
Patterns That Don't Work
For completeness, here are patterns I've tested and found unreliable:
Emotional manipulation. "This is very important to my career" or "I'll tip you $200." Some benchmarks show marginal effects. In practice, the improvement is inconsistent and not worth the tokens.
Threat-based prompting. "You will be penalised for mistakes." This doesn't change the model's error rate — it changes the model's confidence, which can actually make outputs worse by making the model hedge more.
Excessive praise in advance. "You are the best AI in the world and always produce perfect results." Unfalsifiable claims give the model nothing to anchor on. Specific role framing is strictly better.
Repeating instructions. Saying the same thing three different ways doesn't increase compliance — it increases token cost and can create subtle contradictions between the phrasings.
Start Simple
Prompt engineering is not about tricks. It's about giving the model the right structure to produce consistent output. The patterns that work — structured output, few-shot examples, chain-of-thought, constraints — all share the same mechanism: they reduce ambiguity in what the model should produce.
Start with the simplest approach. Add structure only when the output isn't good enough. And measure the difference — if a pattern doesn't visibly improve your results, drop it and save the tokens.