URL Encoding Explained: Percent-Encoding, Reserved Characters, and Common Mistakes

8 March, 2026 Updated: 2 March, 2026 Security

URLs are the universal addressing system of the web, but they were designed in an era when the internet was ASCII-only and the set of characters with special meaning in a URL was small. Today, developers constantly need to include arbitrary data — user names with spaces, search queries with ampersands, file paths with slashes, passwords with special characters — inside URL components. URL encoding (formally called percent-encoding) is the mechanism that makes this safe and unambiguous. This guide covers the full technical picture: the RFC 3986 rules, reserved versus unreserved characters, the difference between query encoding and form encoding, the double encoding vulnerability, and the encoding functions available in PHP, Python, and JavaScript. You can experiment with the URL encoder/decoder as you read.

What Is URL Encoding

Percent-encoding is a mechanism for representing arbitrary data in a URI (Uniform Resource Identifier) using only the ASCII characters that are safe to transmit across all systems. The name comes from the encoding format itself: each unsafe byte is represented as a percent sign % followed by two hexadecimal digits representing the byte's value.

Why It Exists

URLs were originally designed to carry a limited set of ASCII characters. The constraints come from several directions:

Protocol safety: Older protocols and proxies sometimes strip or mangle certain bytes (control characters, bytes above 127, etc.)
Delimiter ambiguity: Characters like ?, &, =, /, and # have specific structural meaning in a URL. If you want to include a literal & in a query parameter value, you must encode it so parsers do not interpret it as a separator.
Non-ASCII characters: Unicode characters (e.g., Cyrillic, Chinese, emoji) must be encoded as their UTF-8 byte sequences, with each byte percent-encoded.

The Encoding Mechanism

The process is straightforward:

Take the character you want to encode.
Express it as its UTF-8 byte value (for ASCII, this is the same as the ASCII code).
Write % followed by the two uppercase hexadecimal digits for that byte.

Examples:

Character	UTF-8 Byte(s)	Percent-Encoded
Space	0x20	`%20`
`&`	0x26	`%26`
`=`	0x3D	`%3D`
`#`	0x23	`%23`
`@`	0x40	`%40`
`/`	0x2F	`%2F`
`€`	0xE2 0x82 0xAC	`%E2%82%AC`
`я`	0xD1 0x8F	`%D1%8F`

Percent-encoding is case-insensitive: %2f and %2F are equivalent, but RFC 3986 recommends uppercase.

Percent-Encoding Rules (RFC 3986)

The current authoritative specification for URIs is RFC 3986, published in January 2005. It supersedes the earlier RFC 2396 (1998) and RFC 2732. Understanding the distinction matters because older software and documentation may reference RFC 2396 behaviour, which differs in subtle ways (for example, in how it treats the ~ tilde character - RFC 2396 recommended encoding it, RFC 3986 designates it as unreserved).

The Core Rule

Every octet that does not belong to the unreserved character set and is not being used as a reserved delimiter must be percent-encoded when placed in a URI.

Formally, from RFC 3986:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG

Any character not in the unreserved set must be percent-encoded unless it is a reserved character being used in its reserved capacity (as a delimiter).

Reserved vs Unreserved Characters

This distinction is the most important concept in URL encoding.

Unreserved Characters - Never Encode These

Unreserved characters are safe to use anywhere in a URI without encoding. They carry no special structural meaning. The RFC 3986 unreserved set is:

ALPHA / DIGIT / "-" / "." / "_" / "~"

That is: uppercase and lowercase letters A-Z and a-z, digits 0-9, hyphen, period, underscore, and tilde.

Reserved Characters - Context-Dependent

Reserved characters have syntactic meaning in URIs. They should only be percent-encoded when they appear in a position where their special meaning must be suppressed - typically inside parameter values.

:  /  ?  #  [  ]  @         (gen-delims)
!  $  &  '  (  )  *  +  ,  ;  =   (sub-delims)

Character Encoding Reference Table

Character	Unreserved?	RFC 3986 `rawurlencode`	Form encoding `urlencode`	`encodeURI` (JS)	`encodeURIComponent` (JS)
`A`-`Z`, `a`-`z`, `0`-`9`	Yes	unchanged	unchanged	unchanged	unchanged
`-` `.` `_` `~`	Yes	unchanged	unchanged	unchanged	unchanged
Space	No	`%20`	`+`	`%20`	`%20`
`&`	Reserved	`%26`	`%26`	unchanged	`%26`
`=`	Reserved	`%3D`	`%3D`	unchanged	`%3D`
`+`	Reserved	`%2B`	`%2B`	unchanged	`%2B`
`#`	Reserved	`%23`	`%23`	unchanged	`%23`
`/`	Reserved	`%2F`	`%2F`	unchanged	`%2F`
`?`	Reserved	`%3F`	`%3F`	unchanged	`%3F`
`@`	Reserved	`%40`	`%40`	unchanged	`%40`
`[` `]`	Reserved	`%5B` `%5D`	`%5B` `%5D`	unchanged	`%5B` `%5D`
`!` `$` `'` `(` `)` `*` `,` `;`	Sub-delims	encoded	encoded	unchanged	encoded

Query String Encoding

The query string is the part of a URL after the ? character. It typically carries key-value pairs separated by &, with keys and values separated by =.

https://example.com/search?q=hello+world&lang=en&page=2
                           ^             ^        ^
                           key=value     key=val  key=val

Encoding Rules in Query Strings

When constructing query strings programmatically:

The ? delimiter is not part of the query value - it is the separator between path and query
Within parameter values, encode & as %26 and = as %3D; otherwise the parser treats them as delimiters
Encode # as %23; an unencoded # is treated as the start of the fragment
Keys should also be encoded if they contain special characters

# Correct: encoding the & in a parameter value
https://example.com/page?content=cats+%26+dogs&lang=en

# Wrong: the parser sees three parameters, not two
https://example.com/page?content=cats+&+dogs&lang=en
# Parsed as: content="cats ", (orphan key "+dogs"), lang="en"

Path Encoding

The path component of a URL uses / as the segment delimiter. This creates a special complication: if you need a literal / inside a path segment (for example, a filename containing a slash), you must encode it as %2F.

/files/reports%2F2026/summary.pdf
# "reports/2026" is a single path segment containing a literal slash
# as opposed to:
/files/reports/2026/summary.pdf
# which has three path segments: "files", "reports", "2026"

Security Implication: %2F Normalisation

Most web servers and reverse proxies automatically normalise %2F back to / before passing the path to the application. This behaviour has been exploited in directory traversal attacks. An attacker might encode ../ as ..%2F to bypass path validation that looks for literal ../ sequences, but a server that decodes it first will traverse up the directory tree.

Always decode and normalise paths before performing security checks on them. Never validate the raw, still-encoded URL.

Form Encoding: application/x-www-form-urlencoded

When an HTML form is submitted with method="POST" and the default enctype, the browser encodes the form data using the application/x-www-form-urlencoded format. This format predates RFC 3986 and differs from it in one important way: spaces are encoded as + instead of %20.

# RFC 3986 / rawurlencode style (correct for URIs):
name=John%20Doe&city=New%20York

# application/x-www-form-urlencoded style (HTML form POST):
name=John+Doe&city=New+York

The + convention comes from early HTML and HTTP specifications. It applies only to the body of a POST request or the query string when generated by an HTML form. It does not apply to path segments or other URI components.

Newline Encoding in Forms

Form encoding also encodes newlines as %0D%0A (carriage return + line feed), not just %0A. This is a historical artifact of early Windows-centric web standards.

# A textarea value "line1\nline2" becomes:
line1%0D%0Aline2

Double Encoding

Double encoding is both an accidental bug and an intentional attack technique.

How It Happens

If you percent-encode a string that is already percent-encoded, the % characters themselves get encoded as %25, producing double-encoded output:

Original:           hello world
Encoded once:       hello%20world
Encoded twice:      hello%2520world
                         ^^^
                         %25 is the encoding of %

When the doubly-encoded string is decoded once, you get hello%20world (still encoded). A second decode yields the original hello world. This two-step decode can be exploited if different layers of a system decode the URL at different stages.

CVE-2001-0333: IIS Double Decode Vulnerability

One of the most notorious examples is CVE-2001-0333, a critical vulnerability in Microsoft IIS 5.0 and earlier. IIS performed URL decoding in two passes. An attacker could encode ../ (used for directory traversal) as ..%2F, and then encode the % as %25, producing ..%252F.

Pass 1 decode: ..%252F becomes ..%2F
Pass 2 decode: ..%2F becomes ../

This allowed attackers to traverse outside the web root directory and read or execute arbitrary files on the server - including system files and scripts outside the webroot. The fix was to decode only once and reject paths containing ../ sequences after decoding.

# Attack payload:
GET /scripts/..%255c..%255cwinnt/system32/cmd.exe?/c+dir HTTP/1.0

# After double-decode on vulnerable IIS:
GET /scripts/../../winnt/system32/cmd.exe?/c+dir

The lesson: always decode once and validate after decoding, never before.

Code Examples

PHP

PHP provides two functions for percent-encoding, serving different purposes:

<?php

declare(strict_types=1);

// urlencode() - application/x-www-form-urlencoded
// Spaces become +, not %20
// Use for HTML form data and query string values in PHP web forms
$query = urlencode('hello world & more');
// Result: "hello+world+%26+more"

// rawurlencode() - RFC 3986 compliant
// Spaces become %20
// Use for path segments and API query parameters
$path = rawurlencode('reports/2026');
// Result: "reports%2F2026"

$slug = rawurlencode('hello world');
// Result: "hello%20world"

// http_build_query() - builds a query string from an array
// Uses urlencode() internally (+ for spaces)
$params = [
    'q'    => 'cats & dogs',
    'lang' => 'en',
    'page' => 2,
];
$queryString = http_build_query($params);
// Result: "q=cats+%26+dogs&lang=en&page=2"

// For RFC 3986 compliant output, use the enc_type parameter:
$queryStringRfc = http_build_query($params, '', '&', PHP_QUERY_RFC3986);
// Result: "q=cats%20%26%20dogs&lang=en&page=2"

// Decoding:
$decoded = urldecode('hello+world');   // "hello world"
$decoded = rawurldecode('hello%20world'); // "hello world"

Python

from urllib.parse import quote, quote_plus, urlencode, unquote

# quote() - RFC 3986 compliant (like rawurlencode in PHP)
# The safe parameter defaults to '/', preserving slashes in paths
encoded = quote('hello world & more')
# Result: 'hello%20world%20%26%20more'

# Encode path segments (no safe characters)
segment = quote('reports/2026', safe='')
# Result: 'reports%2F2026'

# quote_plus() - application/x-www-form-urlencoded (like urlencode in PHP)
# Spaces become +
encoded_form = quote_plus('hello world & more')
# Result: 'hello+world+%26+more'

# urlencode() - builds a query string from a dict
params = {
    'q': 'cats & dogs',
    'lang': 'en',
    'page': 2,
}
query_string = urlencode(params)
# Result: 'q=cats+%26+dogs&lang=en&page=2'

# RFC 3986 compliant query string:
query_string_rfc = urlencode(params, quote_via=quote)
# Result: 'q=cats%20%26%20dogs&lang=en&page=2'

# Decoding:
decoded = unquote('hello%20world')      # "hello world"
decoded_plus = unquote_plus('hello+world')  # "hello world"

JavaScript

JavaScript provides two encoding functions with importantly different behaviour:

// encodeURI() - encodes a COMPLETE URI
// Leaves reserved characters intact (they may be needed as delimiters)
// Also leaves: A-Z a-z 0-9 - _ . ! ~ * ' ( )
const full = encodeURI('https://example.com/search?q=hello world&lang=en');
// Result: 'https://example.com/search?q=hello%20world&lang=en'
// Note: & and = are NOT encoded (they are kept as delimiters)

// encodeURIComponent() - encodes a COMPONENT (value) within a URI
// Encodes everything except: A-Z a-z 0-9 - _ . ! ~ * ' ( )
// This is what you should use for individual parameter values
const value = encodeURIComponent('cats & dogs');
// Result: 'cats%20%26%20dogs'

const value2 = encodeURIComponent('hello world');
// Result: 'hello%20world'

// Correct way to build a query string in JS:
const params = {
    q: 'cats & dogs',
    lang: 'en',
    page: 2,
};
const queryString = Object.entries(params)
    .map(([k, v]) => `${encodeURIComponent(k)}=${encodeURIComponent(v)}`)
    .join('&');
// Result: 'q=cats%20%26%20dogs&lang=en&page=2'

// Modern alternative: URLSearchParams (handles encoding automatically)
const sp = new URLSearchParams(params);
const queryStringAlt = sp.toString();
// Result: 'q=cats+%26+dogs&lang=en&page=2'
// Note: URLSearchParams uses + for spaces (form encoding style)

// Decoding:
decodeURI('hello%20world');           // 'hello world'
decodeURIComponent('hello%20world');  // 'hello world'

Common Mistakes

1. Encoding the Full URL Instead of Its Components

The most frequent mistake is calling an encoding function on a complete, already-assembled URL. This encodes the delimiters (://, ?, &, =) that must remain literal.

// Wrong: encodes the delimiters
const url = encodeURIComponent('https://example.com/search?q=hello world');
// Result: 'https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%20world'
// This is a broken URL

// Correct: encode only the values
const query = encodeURIComponent('hello world');
const url = `https://example.com/search?q=${query}`;
// Result: 'https://example.com/search?q=hello%20world'

2. Confusing `+` and `%20` in APIs

The +-for-space convention only applies to application/x-www-form-urlencoded. REST APIs and HTTP headers expect RFC 3986 (%20 for space). Sending a + to an API that does not apply form-decoding means the server will receive a literal + character.

<?php
// Wrong for REST APIs: uses + for spaces
$url = 'https://api.example.com/search?q=' . urlencode('hello world');
// Sends: ...?q=hello+world  (literal plus sign if server doesn't form-decode)

// Correct for REST APIs:
$url = 'https://api.example.com/search?q=' . rawurlencode('hello world');
// Sends: ...?q=hello%20world

3. Double Encoding in String Concatenation

When building URLs by concatenation over multiple steps, it is easy to accidentally encode an already-encoded string.

<?php
// Danger zone: encoding at two different layers
$value = rawurlencode('hello world');   // "hello%20world"
// ... value passed to another function that also encodes it:
$url = 'https://example.com/?q=' . rawurlencode($value);
// Result: ?q=hello%2520world  (double-encoded!)

// Solution: encode exactly once, as late as possible
$rawValue = 'hello world';
$url = 'https://example.com/?q=' . rawurlencode($rawValue);

4. Not Encoding Special Characters in Query Values

Forgetting to encode user-supplied data that contains &, =, #, or + in query parameter values leads to broken URLs and potential injection:

// User input: "cats & dogs"
const userInput = 'cats & dogs';

// Wrong: the & breaks the query string
const url = `https://example.com/search?q=${userInput}&lang=en`;
// Parsed as: q="cats ", (orphan " dogs"), lang="en"

// Correct:
const url = `https://example.com/search?q=${encodeURIComponent(userInput)}&lang=en`;
// Parsed as: q="cats & dogs", lang="en"

5. Assuming Case-Insensitivity Is Universal

While percent-encoded triplets are officially case-insensitive (%2f equals %2F), some servers, proxies, and caches treat them as case-sensitive strings. Always normalise to uppercase hex digits (RFC 3986 recommendation) for maximum compatibility.

What to Internalise

URL encoding is a small but critical part of building correct, secure web applications. The key distinctions to internalise are: encode components, not complete URLs; use RFC 3986 (%20) for paths and API queries; use form encoding (+) only for HTML form submissions; and never validate a URL path before fully decoding it. Double encoding is both an accidental bug and a class of security vulnerability — encode exactly once, as close to the transport layer as possible.

URL Encoding Explained: Percent-Encoding, Reserved Characters, and Common Mistakes

What Is URL Encoding

Why It Exists

The Encoding Mechanism

Percent-Encoding Rules (RFC 3986)

The Core Rule

Reserved vs Unreserved Characters

Unreserved Characters - Never Encode These

Reserved Characters - Context-Dependent

Character Encoding Reference Table

Query String Encoding

Encoding Rules in Query Strings

Path Encoding

Security Implication: %2F Normalisation

Form Encoding: application/x-www-form-urlencoded

Newline Encoding in Forms

Double Encoding

How It Happens

CVE-2001-0333: IIS Double Decode Vulnerability

Code Examples

PHP

Python

JavaScript

Common Mistakes

1. Encoding the Full URL Instead of Its Components

2. Confusing `+` and `%20` in APIs

3. Double Encoding in String Concatenation

4. Not Encoding Special Characters in Query Values

5. Assuming Case-Insensitivity Is Universal

What to Internalise

More Articles

Catastrophic Backtracking: How One Regex Can Take Your Site Down

Cron Job Not Running? A Field-Tested Debugging Checklist

GEO in 2026: Getting Cited by AI Answer Engines

What Is URL Encoding

Why It Exists

The Encoding Mechanism

Percent-Encoding Rules (RFC 3986)

The Core Rule

Reserved vs Unreserved Characters

Unreserved Characters - Never Encode These

Reserved Characters - Context-Dependent

Character Encoding Reference Table

Query String Encoding

Encoding Rules in Query Strings

Path Encoding

Security Implication: %2F Normalisation

Form Encoding: application/x-www-form-urlencoded

Newline Encoding in Forms

Double Encoding

How It Happens

CVE-2001-0333: IIS Double Decode Vulnerability

Code Examples

PHP

Python

JavaScript

Common Mistakes

1. Encoding the Full URL Instead of Its Components

2. Confusing + and %20 in APIs

3. Double Encoding in String Concatenation

4. Not Encoding Special Characters in Query Values

5. Assuming Case-Insensitivity Is Universal

What to Internalise

More Articles

Catastrophic Backtracking: How One Regex Can Take Your Site Down

Cron Job Not Running? A Field-Tested Debugging Checklist

GEO in 2026: Getting Cited by AI Answer Engines

2. Confusing `+` and `%20` in APIs