URL Slugs: Rules, SEO Impact, and Transliteration
22 March, 2026 Web
URL Slugs: Rules, SEO Impact, and Transliteration
A URL slug is the human-readable segment at the end of a URL path that identifies a specific resource. In https://example.com/blog/url-slug-best-practices, the slug is url-slug-best-practices. It is short, descriptive, and composed entirely of lowercase ASCII characters separated by hyphens. Getting slugs right matters for readability, SEO, and long-term URL stability. Getting them wrong — silently truncating Unicode, using underscores, or generating unstable identifiers — costs real traffic and creates maintenance debt. You can test the rules below against any title with the slug generator.
The Rules
A well-formed slug satisfies these invariants:
- Lowercase only. URLs are case-sensitive by spec (RFC 3986), but most web servers treat path segments case-insensitively. Having two representations of the same resource (
/Blog/Postand/blog/post) creates duplicate content issues. Normalising to lowercase eliminates the problem entirely. - Hyphens as word separators, not underscores. Google's crawlers treat a hyphen as a word separator, meaning
url-slugis indexed as two words: "url" and "slug". An underscore is treated as a word joiner:url_slugis indexed as a single tokenurlslug. This is documented in Google's URL structure guidelines and has been confirmed by John Mueller repeatedly. Use hyphens. - Only
[a-z0-9-]characters. Strip everything else: punctuation, special characters, emoji. Any character outside this set either needs percent-encoding (which harms readability) or causes inconsistent behaviour across systems. - No leading or trailing hyphens. A slug like
-my-post-is technically valid but looks broken. Always trim hyphens from both ends after processing. - No consecutive hyphens.
my--postis an artifact of the slugification process (typically from stripping punctuation that was surrounded by spaces). Collapse runs of hyphens to a single one.
The canonical regex that validates a correctly formed slug:
^[a-z0-9]+(?:-[a-z0-9]+)*$
SEO Specifics
Hyphen vs Underscore - the Google Preference
Google's documentation on URL structure explicitly recommends hyphens over underscores. The practical consequence: a post titled "Node.js Best Practices" slugged as nodejs_best_practices will rank for the single token "nodejsbestpractices", not for "node js best practices". The hyphenated version nodejs-best-practices is decomposed into individual words that match user queries.
Slug Stability and PageRank
A URL is an identity. When you change a slug - even to fix a typo - you create a new URL. The original URL has accumulated PageRank, inbound links, and cached positions in search indices. Without a 301 permanent redirect from the old slug to the new one, that equity is discarded. The practical rule: treat slugs as permanent the moment a page is indexed. Add the redirect if you must change a slug, but prefer not to change it at all.
URL Length
Google does not publish a strict character limit for URLs, but their crawlers handle shorter URLs more reliably, and shorter URLs display better in search results. The practical guidance is to keep the path under roughly 75 characters. This means slug generation must truncate long titles - at a word boundary, not in the middle of a word.
Canonical URLs
When the same content is accessible under multiple URLs (with and without trailing slash, HTTP vs HTTPS, www vs non-www), use a canonical link element to tell search engines which URL is authoritative. Slug generation is upstream of this concern, but consistent slug rules prevent accidental duplicate URLs at the slug level.
Unicode Transliteration
The majority of web content is not ASCII. Blog posts, product names, and user-generated content arrive in Russian, Chinese, Arabic, German, and hundreds of other scripts. Slugifying "Héllo Wörld" as an empty string or a string of percent-encoded bytes is wrong. The correct approach is transliteration: converting non-ASCII characters to their nearest ASCII equivalent before applying slug rules.
The Algorithm
- NFD normalisation. Unicode Normalisation Form D (Canonical Decomposition) decomposes precomposed characters into their base character plus combining mark(s).
é(U+00E9, LATIN SMALL LETTER E WITH ACUTE) becomese(U+0065) +◌́(U+0301, COMBINING ACUTE ACCENT). This separates the "letter" from the "decoration". - Strip combining characters. Characters in the Unicode category
Mn(Mark, Nonspacing) are the combining marks. Removing them convertsé→e,ü→u,ñ→n,ç→c. - Map remaining non-ASCII. NFD + strip handles Latin-script languages with diacritics. For non-Latin scripts (Cyrillic, Greek, Chinese, Arabic, Hebrew, Japanese), a transliteration table is needed. The ICU (International Components for Unicode) library provides
Any-Latintransliteration that covers most scripts. A word likeПривет(Russian for "Hello") becomesPrivet;北京becomesBěijīng, which after NFD stripping becomesBeijing. - Apply slug rules. Lowercase, replace non-
[a-z0-9]with hyphens, collapse multiple hyphens, trim.
The result: "Héllo Wörld" → NFD → "He\u0301llo Wo\u0308rld" → strip marks → "Hello World" → lowercase + replace spaces → "hello-world".
Implementation
PHP
PHP's intl extension (shipped with most distributions and required by Symfony) exposes ICU transliteration directly.
<?php
declare(strict_types=1);
function slugify(string $text, int $maxLength = 75): string
{
// Transliterate any script to ASCII using ICU's Any-Latin; Latin-ASCII chain
$text = transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', $text);
// Replace any character that is not a lowercase letter, digit, or hyphen with a hyphen
$text = preg_replace('/[^a-z0-9]+/', '-', $text);
// Collapse multiple hyphens and trim from both ends
$text = trim((string) $text, '-');
if ($text === '') {
return '';
}
// Truncate at word boundary if too long
if (strlen($text) > $maxLength) {
$text = substr($text, 0, $maxLength);
$lastHyphen = strrpos($text, '-');
if ($lastHyphen !== false && $lastHyphen > $maxLength / 2) {
$text = substr($text, 0, $lastHyphen);
}
$text = trim($text, '-');
}
return $text;
}
// Examples:
slugify('Héllo Wörld'); // "hello-world"
slugify('Привет мир'); // "privet-mir"
slugify('北京欢迎你'); // "bei-jing-huan-ying-ni"
slugify('PHP: The Right Way!'); // "php-the-right-way"
If the intl extension is unavailable, a fallback using iconv handles Latin-script diacritics:
<?php
declare(strict_types=1);
function slugifyFallback(string $text): string
{
// Convert to ASCII using iconv transliteration (Latin scripts only)
$ascii = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $text);
$ascii = strtolower((string) $ascii);
$ascii = preg_replace('/[^a-z0-9]+/', '-', $ascii);
return trim((string) $ascii, '-');
}
The iconv approach does not handle Cyrillic, Chinese, or Arabic. Prefer transliterator_transliterate for any site with non-Latin content.
JavaScript
JavaScript does not ship built-in transliteration for non-Latin scripts. NFD normalization and diacritic stripping are native; for full transliteration use the transliteration or slug npm packages.
// Native: handles Latin-script diacritics via NFD normalization
function slugify(text, maxLength = 75) {
let slug = text
.normalize('NFD') // decompose diacritics
.replace(/[\u0300-\u036f]/g, '') // strip combining marks
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-') // non-alphanumeric to hyphen
.replace(/^-+|-+$/g, ''); // trim leading/trailing hyphens
if (slug.length > maxLength) {
slug = slug.slice(0, maxLength);
const lastHyphen = slug.lastIndexOf('-');
if (lastHyphen > maxLength / 2) {
slug = slug.slice(0, lastHyphen);
}
slug = slug.replace(/^-+|-+$/g, '');
}
return slug;
}
// For non-Latin scripts, add the transliteration library:
// import { slugify } from 'transliteration';
// slugify('Привет мир') => 'privet-mir'
slugify('Héllo Wörld'); // "hello-world"
slugify('PHP: The Right Way!'); // "php-the-right-way"
Python
Python's unidecode library provides ICU-quality transliteration for all scripts in a single call.
from unidecode import unidecode # pip install unidecode
import re
def slugify(text: str, max_length: int = 75) -> str:
# Transliterate any script to ASCII
text = unidecode(text)
text = text.lower()
# Replace non-alphanumeric characters with hyphens
text = re.sub(r'[^a-z0-9]+', '-', text)
text = text.strip('-')
if len(text) > max_length:
text = text[:max_length]
last_hyphen = text.rfind('-')
if last_hyphen > max_length // 2:
text = text[:last_hyphen]
text = text.strip('-')
return text
# Examples:
slugify('Héllo Wörld') # 'hello-world'
slugify('Привет мир') # 'privet-mir'
slugify('北京欢迎你') # 'bei-jing-huan-ying-ni'
slugify('PHP: The Right Way!') # 'php-the-right-way'
Edge Cases
All-Numeric Slugs
A slug like 12345 is valid by the character rules but ambiguous. Visitors and systems often cannot tell whether it is an ID or a meaningful path segment. Some frameworks route numeric segments to ID-based controllers rather than slug-based ones. If your titles can produce all-numeric slugs (e.g., a post titled "2026"), consider prepending a category prefix: post-2026 or year-2026.
Empty Result After Stripping
A title composed entirely of emoji, special characters, or a script your transliterator does not handle can produce an empty string after slugification. Never silently use an empty slug. Fallback strategies in order of preference:
- Use a hash of the original title (first 8 hex characters of SHA-1 is enough for this purpose).
- Use a UUID v4.
- Raise a validation error and require a manually entered slug.
Reserved Words and System Paths
Do not allow slugs that conflict with system paths. Common conflicts: admin, api, static, assets, login, logout, register, feed, sitemap, robots. Maintain a blocklist and append a suffix when a generated slug matches a reserved word.
Very Long Titles
Truncate at a word boundary, not mid-word. The implementation examples above show the pattern: truncate to maxLength, find the last hyphen in the truncated string, and cut there. This avoids slugs ending in ...best-practi.
Collision Handling
When two different resources produce the same slug, you need a deterministic resolution strategy. The standard approach is sequential suffixing:
<?php
declare(strict_types=1);
function uniqueSlug(string $baseSlug, callable $exists): string
{
if (!$exists($baseSlug)) {
return $baseSlug;
}
$counter = 2;
do {
$candidate = $baseSlug . '-' . $counter;
$counter++;
} while ($exists($candidate));
return $candidate;
}
// Usage:
$slug = uniqueSlug(
slugify($title),
static fn (string $s): bool => $articleRepository->existsBySlug($s),
);
// "my-post", "my-post-2", "my-post-3", etc.
Avoid appending a random hash as the default collision resolution. Hashes produce unstable, unguessable URLs and defeat the readability purpose of having a slug at all. Sequential suffixes (-2, -3) are predictable and human-friendly.
UUID-Based vs Human-Readable Slugs
Some applications avoid the collision and stability problems entirely by using UUID or ULID-based URLs:
/posts/01JPXK3G8EQ4FVZMCQ7N1BWSRH ← ULID-based
/posts/a1b2c3d4 ← short hash
/posts/my-post-title ← human-readable slug
The trade-off is explicit:
| Property | UUID/ULID | Human-readable slug |
|---|---|---|
| Uniqueness | Guaranteed by construction | Requires collision handling |
| Stability | Permanent, never changes | At risk when title is edited |
| Readability | None | High |
| SEO value | Minimal - no keywords | Moderate - keywords in URL |
| Guessability | Zero | Moderate |
| Implementation | Simple | Requires transliteration + deduplication |
A middle-ground approach that works well for large content platforms: generate a human-readable slug, append the first 8 characters of the record's ULID as a suffix, and never change it regardless of title edits:
/posts/my-post-title-01jpxk3g
The slug is readable, collision-free by construction, and stable because it is tied to the record ID rather than the title.
The Checklist
A good slug is lowercase, hyphen-separated, ASCII-only, bounded in length, and stable over time. The implementation details that matter most in practice are: use ICU transliteration (not naive iconv) for non-Latin scripts, truncate at word boundaries not character boundaries, handle empty-after-strip gracefully, and treat a published slug as immutable. Use 301 redirects when you must change a slug, maintain a blocklist of reserved paths, and if stability is more important than readability, embed a short ID suffix so the slug can survive title edits.