The 4.5× Token Tax of Bloated AI Custom Instructions (and How to Cut It)

3 March, 2026 Updated: 28 April, 2026 AI

Custom instructions are one of the most powerful features in modern AI assistants — and one of the most misused. Most developers write them once, pile on rules, and never think about the cost. This article explains what actually happens when your instruction runs, why poorly written instructions silently drain your budget, and how to write lean, effective ones.

What Happens on Every Single Request

When you send a message to Claude or ChatGPT, the API doesn't just receive your message. It receives a complete package assembled fresh every time:

[System prompt / custom instructions]  ← your rules, every time
[Conversation history]                 ← all previous messages
[Your new message]                     ← what you just typed

This is the context window — and you pay for every token in it on every request. Not once when you set up the instructions. Every. Single. Request.

This has a direct consequence: a 1,000-token system prompt on a 100-message conversation costs you 100,000 tokens just for the instructions alone — before any actual work gets done.

The Token Timeline of a Single Request

Here's what happens step by step when you submit a message with custom instructions:

Step 1 — Tokenization. Your instructions are split into tokens. A token is roughly 3-4 characters of English text. "Use TypeScript strict mode" is 6 tokens. A 500-word instruction block is ~700 tokens.

Step 2 — Context assembly. The model receives: [instructions] + [history] + [new message]. All of it, in full.

Step 3 — Attention over the full context. The transformer processes every token against every other token. More tokens = more computation = higher latency and higher cost. This is why a 2,000-token system prompt noticeably slows down response time compared to a 200-token one.

Step 4 — Response generation. Output tokens are generated one by one. You pay for these too, but they're typically a smaller portion of the total.

Step 5 — No memory between sessions. The model doesn't remember your instructions from last time. Everything is re-sent and re-processed from scratch.

Bloated vs Lean: A Real Comparison

Let's compare two instruction sets for a PHP developer working on a Symfony project.

Bloated instructions (~820 tokens)

You are an experienced PHP developer with 15 years of experience.
You know Symfony very well, including all its components.
You should always use PHP 8.x features when possible.
Use readonly properties for immutable data.
Use match expressions instead of switch statements.
Use named arguments for clarity.
Use first-class callables instead of anonymous functions.
Use array_find() instead of reset(array_filter()).
Always add strict types declaration at the top of every file.
Follow PSR-12 coding standards.
Use SOLID principles.
Follow DDD principles.
Use clean architecture patterns.
Always use dependency injection.
Never use static methods except for named constructors.
Write meaningful variable names.
Write meaningful method names.
Keep methods short and focused.
Keep classes small and focused.
Write tests for all business logic.
Use PHPUnit for testing.
Use data providers for parameterized tests.
Mock external dependencies in tests.
Use Symfony's testing utilities.
Always validate user input at the system boundary.
Never trust external data.
Use Symfony's validation component.
Return early from methods to reduce nesting.
Use exceptions for exceptional cases only.
Log errors appropriately.
Use Symfony's logger service.
...and 15 more rules

Cost per 1,000 requests: ~820,000 input tokens

Lean instructions (~180 tokens)

Senior PHP/Symfony developer. PHP 8.4, Symfony 8.

Always use: static fn, first-class callables, readonly class,
typed constants, array_find(), match over switch, named arguments.
declare(strict_types=1) always. PSR-12.

Verify only when genuinely uncertain. Skip for routine edits.
Prefer one broad search over multiple narrow ones.

Cost per 1,000 requests: ~180,000 input tokens

That's 4.5× cheaper — for functionally equivalent guidance. The lean version drops obvious things ("write meaningful names", "use dependency injection") that any senior developer prompt already implies, and keeps only the specifics the model wouldn't know without being told.

Why Redundant Rules Are Actively Harmful

It's not just about cost. Verbose instructions cause three concrete problems:

1. Rule dilution. The more rules you add, the less attention weight any single rule gets. A 50-rule instruction set means each rule competes for the model's attention. A 10-rule set means each rule is clearly weighted. Counterintuitively, more rules = worse compliance.

2. Conflicting signals. Long instructions often contain subtle contradictions. "Always add comments" + "keep code self-documenting" = ambiguity. The model has to resolve conflicts, which introduces inconsistency.

3. Latency. Processing 800 extra tokens adds ~200-400ms to time-to-first-token on frontier models. On a long coding session, this compounds across dozens of requests.

These aren't just theoretical concerns. In February 2026, ETH Zurich published a study that tested context files on real coding tasks — and the results confirmed exactly this: more instructions led to lower success rates, higher costs, and broader but less targeted exploration.

What Prompt Caching Changes

Anthropic's prompt caching (available on Claude) reduces the cost of a cached system prompt to ~10% of the original price. OpenAI has a similar feature.

This means: if your instructions don't change between requests, the API can reuse the cached computation instead of reprocessing from scratch.

How to take advantage of it:

Keep your system prompt stable. Don't dynamically modify instructions based on the conversation.
Put instructions at the very beginning of the context, before any variable content.
Caching kicks in automatically after the same prefix has been seen enough times. You don't call anything special.

With caching enabled, the cost difference between bloated and lean instructions shrinks — but latency doesn't improve. The model still has to attend over all tokens, cached or not. Lean instructions are still faster.

What Actually Belongs in Custom Instructions

A useful rule: only include things the model can't infer from context.

Include:

Specific framework versions or unusual configurations: Symfony 8, PHP 8.4 (models default to older versions)
Rules that contradict common patterns: no PHPDoc blocks, never use static factories
Project-specific conventions: specific naming conventions, file structure rules
Workflow constraints: when to verify, when to skip, how to handle ambiguity

Don't include:

Generic best practices the model already follows: "write clean code", "use meaningful names"
Things implied by seniority: "prefer composition over inheritance", "don't repeat yourself"
Framework defaults: "use dependency injection in Symfony" — it's the only way to wire services
Obvious constraints: "don't introduce security vulnerabilities"

The Hidden Cost: Instruction-Triggered Behaviour

Some instructions don't just add tokens — they change behaviour in ways that multiply requests.

Consider:

After every change, verify with a full test suite run.
Review the codebase before every implementation.
Update a lessons file after every correction.

Each of these instructions doesn't just cost its own tokens. They instruct the model to:

Run a command (1 tool call)
Read the output (1 tool call)
Possibly read additional files (N tool calls)
Write a file (1 tool call)

A single "after every change" instruction can turn a 2-tool-call task into an 8-tool-call task. Tool calls are the most expensive part of an agentic session — each one is a full round-trip with the entire context.

Better pattern: use conditional rules.

# Instead of:
After every change, verify with tests.

# Write:
Run tests only when there's genuine uncertainty:
new routing, complex DI wiring, reported errors.
Skip for: template edits, config values, copy changes.

This cuts tool calls by 60-80% on typical coding sessions without reducing quality on the cases where verification actually matters.

Practical Checklist

Before finalizing your custom instructions, go through each rule and ask:

Would a senior developer in this stack do this by default? If yes, remove it.
Is this rule specific to my project, or is it generic advice? Generic → remove.
Does following this rule trigger extra tool calls? If yes, make it conditional.
Am I repeating the same idea in different words? Merge or remove.
Could I test whether the model follows this without the instruction? If yes, remove it and see what happens.

A good set of custom instructions should feel surprisingly short. If you can't summarise it in a paragraph, it's probably doing more harm than good.

If you'd rather adapt a working baseline than write from scratch, the Custom Instructions tool has lean copy-paste templates for Cursor, Claude Code, Codex, and Gemini, with setup notes per provider.

What the Numbers Show

	Bloated instructions	Lean instructions
Tokens per request	~820	~180
Cost per 1,000 requests	4.5× higher	baseline
Response latency	+200-400ms	baseline
Rule compliance	lower	higher
Tool calls triggered	more	fewer

Custom instructions are a multiplier — they affect every single interaction. Writing them carefully once saves thousands of tokens and noticeably faster responses across the lifetime of a project. Keep them short, specific, and conditional.