URL Slugs: Rules, SEO Impact, and Transliteration

22 March, 2026 Web

URL Slugs: Rules, SEO Impact, and Transliteration

A URL slug is the human-readable segment at the end of a URL path that identifies a specific resource. In https://example.com/blog/url-slug-best-practices, the slug is url-slug-best-practices. It is short, descriptive, and composed entirely of lowercase ASCII characters separated by hyphens. Getting slugs right matters for readability, SEO, and long-term URL stability. Getting them wrong — silently truncating Unicode, using underscores, or generating unstable identifiers — costs real traffic and creates maintenance debt. You can test the rules below against any title with the slug generator.

The Rules

A well-formed slug satisfies these invariants:

Lowercase only. URLs are case-sensitive by spec (RFC 3986), but most web servers treat path segments case-insensitively. Having two representations of the same resource (/Blog/Post and /blog/post) creates duplicate content issues. Normalising to lowercase eliminates the problem entirely.
Hyphens as word separators, not underscores. Google's crawlers treat a hyphen as a word separator, meaning url-slug is indexed as two words: "url" and "slug". An underscore is treated as a word joiner: url_slug is indexed as a single token urlslug. This is documented in Google's URL structure guidelines and has been confirmed by John Mueller repeatedly. Use hyphens.
Only [a-z0-9-] characters. Strip everything else: punctuation, special characters, emoji. Any character outside this set either needs percent-encoding (which harms readability) or causes inconsistent behaviour across systems.
No leading or trailing hyphens. A slug like -my-post- is technically valid but looks broken. Always trim hyphens from both ends after processing.
No consecutive hyphens. my--post is an artifact of the slugification process (typically from stripping punctuation that was surrounded by spaces). Collapse runs of hyphens to a single one.

The canonical regex that validates a correctly formed slug:

^[a-z0-9]+(?:-[a-z0-9]+)*$

SEO Specifics

Hyphen vs Underscore - the Google Preference

Google's documentation on URL structure explicitly recommends hyphens over underscores. The practical consequence: a post titled "Node.js Best Practices" slugged as nodejs_best_practices will rank for the single token "nodejsbestpractices", not for "node js best practices". The hyphenated version nodejs-best-practices is decomposed into individual words that match user queries.

Slug Stability and PageRank

A URL is an identity. When you change a slug - even to fix a typo - you create a new URL. The original URL has accumulated PageRank, inbound links, and cached positions in search indices. Without a 301 permanent redirect from the old slug to the new one, that equity is discarded. The practical rule: treat slugs as permanent the moment a page is indexed. Add the redirect if you must change a slug, but prefer not to change it at all.

URL Length

Google does not publish a strict character limit for URLs, but their crawlers handle shorter URLs more reliably, and shorter URLs display better in search results. The practical guidance is to keep the path under roughly 75 characters. This means slug generation must truncate long titles - at a word boundary, not in the middle of a word.

Canonical URLs

When the same content is accessible under multiple URLs (with and without trailing slash, HTTP vs HTTPS, www vs non-www), use a canonical link element to tell search engines which URL is authoritative. Slug generation is upstream of this concern, but consistent slug rules prevent accidental duplicate URLs at the slug level.

Unicode Transliteration

The majority of web content is not ASCII. Blog posts, product names, and user-generated content arrive in Russian, Chinese, Arabic, German, and hundreds of other scripts. Slugifying "Héllo Wörld" as an empty string or a string of percent-encoded bytes is wrong. The correct approach is transliteration: converting non-ASCII characters to their nearest ASCII equivalent before applying slug rules.

The Algorithm

NFD normalisation. Unicode Normalisation Form D (Canonical Decomposition) decomposes precomposed characters into their base character plus combining mark(s). é (U+00E9, LATIN SMALL LETTER E WITH ACUTE) becomes e (U+0065) + ◌́ (U+0301, COMBINING ACUTE ACCENT). This separates the "letter" from the "decoration".
Strip combining characters. Characters in the Unicode category Mn (Mark, Nonspacing) are the combining marks. Removing them converts é → e, ü → u, ñ → n, ç → c.
Map remaining non-ASCII. NFD + strip handles Latin-script languages with diacritics. For non-Latin scripts (Cyrillic, Greek, Chinese, Arabic, Hebrew, Japanese), a transliteration table is needed. The ICU (International Components for Unicode) library provides Any-Latin transliteration that covers most scripts. A word like Привет (Russian for "Hello") becomes Privet; 北京 becomes Běijīng, which after NFD stripping becomes Beijing.
Apply slug rules. Lowercase, replace non-[a-z0-9] with hyphens, collapse multiple hyphens, trim.

The result: "Héllo Wörld" → NFD → "He\u0301llo Wo\u0308rld" → strip marks → "Hello World" → lowercase + replace spaces → "hello-world".

Implementation

PHP

PHP's intl extension (shipped with most distributions and required by Symfony) exposes ICU transliteration directly.

<?php

declare(strict_types=1);

function slugify(string $text, int $maxLength = 75): string
{
    // Transliterate any script to ASCII using ICU's Any-Latin; Latin-ASCII chain
    $text = transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', $text);

    // Replace any character that is not a lowercase letter, digit, or hyphen with a hyphen
    $text = preg_replace('/[^a-z0-9]+/', '-', $text);

    // Collapse multiple hyphens and trim from both ends
    $text = trim((string) $text, '-');

    if ($text === '') {
        return '';
    }

    // Truncate at word boundary if too long
    if (strlen($text) > $maxLength) {
        $text = substr($text, 0, $maxLength);
        $lastHyphen = strrpos($text, '-');
        if ($lastHyphen !== false && $lastHyphen > $maxLength / 2) {
            $text = substr($text, 0, $lastHyphen);
        }
        $text = trim($text, '-');
    }

    return $text;
}

// Examples:
slugify('Héllo Wörld');           // "hello-world"
slugify('Привет мир');            // "privet-mir"
slugify('北京欢迎你');              // "bei-jing-huan-ying-ni"
slugify('PHP: The Right Way!');   // "php-the-right-way"

If the intl extension is unavailable, a fallback using iconv handles Latin-script diacritics:

<?php

declare(strict_types=1);

function slugifyFallback(string $text): string
{
    // Convert to ASCII using iconv transliteration (Latin scripts only)
    $ascii = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $text);
    $ascii = strtolower((string) $ascii);
    $ascii = preg_replace('/[^a-z0-9]+/', '-', $ascii);

    return trim((string) $ascii, '-');
}

The iconv approach does not handle Cyrillic, Chinese, or Arabic. Prefer transliterator_transliterate for any site with non-Latin content.

JavaScript

JavaScript does not ship built-in transliteration for non-Latin scripts. NFD normalization and diacritic stripping are native; for full transliteration use the transliteration or slug npm packages.

// Native: handles Latin-script diacritics via NFD normalization
function slugify(text, maxLength = 75) {
    let slug = text
        .normalize('NFD')                     // decompose diacritics
        .replace(/[\u0300-\u036f]/g, '')      // strip combining marks
        .toLowerCase()
        .replace(/[^a-z0-9]+/g, '-')          // non-alphanumeric to hyphen
        .replace(/^-+|-+$/g, '');             // trim leading/trailing hyphens

    if (slug.length > maxLength) {
        slug = slug.slice(0, maxLength);
        const lastHyphen = slug.lastIndexOf('-');
        if (lastHyphen > maxLength / 2) {
            slug = slug.slice(0, lastHyphen);
        }
        slug = slug.replace(/^-+|-+$/g, '');
    }

    return slug;
}

// For non-Latin scripts, add the transliteration library:
// import { slugify } from 'transliteration';
// slugify('Привет мир') => 'privet-mir'

slugify('Héllo Wörld');          // "hello-world"
slugify('PHP: The Right Way!');  // "php-the-right-way"

Python

Python's unidecode library provides ICU-quality transliteration for all scripts in a single call.

from unidecode import unidecode  # pip install unidecode
import re


def slugify(text: str, max_length: int = 75) -> str:
    # Transliterate any script to ASCII
    text = unidecode(text)
    text = text.lower()
    # Replace non-alphanumeric characters with hyphens
    text = re.sub(r'[^a-z0-9]+', '-', text)
    text = text.strip('-')

    if len(text) > max_length:
        text = text[:max_length]
        last_hyphen = text.rfind('-')
        if last_hyphen > max_length // 2:
            text = text[:last_hyphen]
        text = text.strip('-')

    return text


# Examples:
slugify('Héllo Wörld')         # 'hello-world'
slugify('Привет мир')          # 'privet-mir'
slugify('北京欢迎你')            # 'bei-jing-huan-ying-ni'
slugify('PHP: The Right Way!') # 'php-the-right-way'

Edge Cases

All-Numeric Slugs

A slug like 12345 is valid by the character rules but ambiguous. Visitors and systems often cannot tell whether it is an ID or a meaningful path segment. Some frameworks route numeric segments to ID-based controllers rather than slug-based ones. If your titles can produce all-numeric slugs (e.g., a post titled "2026"), consider prepending a category prefix: post-2026 or year-2026.

Empty Result After Stripping

A title composed entirely of emoji, special characters, or a script your transliterator does not handle can produce an empty string after slugification. Never silently use an empty slug. Fallback strategies in order of preference:

Use a hash of the original title (first 8 hex characters of SHA-1 is enough for this purpose).
Use a UUID v4.
Raise a validation error and require a manually entered slug.

Reserved Words and System Paths

Do not allow slugs that conflict with system paths. Common conflicts: admin, api, static, assets, login, logout, register, feed, sitemap, robots. Maintain a blocklist and append a suffix when a generated slug matches a reserved word.

Very Long Titles

Truncate at a word boundary, not mid-word. The implementation examples above show the pattern: truncate to maxLength, find the last hyphen in the truncated string, and cut there. This avoids slugs ending in ...best-practi.

Collision Handling

When two different resources produce the same slug, you need a deterministic resolution strategy. The standard approach is sequential suffixing:

<?php

declare(strict_types=1);

function uniqueSlug(string $baseSlug, callable $exists): string
{
    if (!$exists($baseSlug)) {
        return $baseSlug;
    }

    $counter = 2;
    do {
        $candidate = $baseSlug . '-' . $counter;
        $counter++;
    } while ($exists($candidate));

    return $candidate;
}

// Usage:
$slug = uniqueSlug(
    slugify($title),
    static fn (string $s): bool => $articleRepository->existsBySlug($s),
);
// "my-post", "my-post-2", "my-post-3", etc.

Avoid appending a random hash as the default collision resolution. Hashes produce unstable, unguessable URLs and defeat the readability purpose of having a slug at all. Sequential suffixes (-2, -3) are predictable and human-friendly.

UUID-Based vs Human-Readable Slugs

Some applications avoid the collision and stability problems entirely by using UUID or ULID-based URLs:

/posts/01JPXK3G8EQ4FVZMCQ7N1BWSRH   ← ULID-based
/posts/a1b2c3d4                       ← short hash
/posts/my-post-title                  ← human-readable slug

The trade-off is explicit:

Property	UUID/ULID	Human-readable slug
Uniqueness	Guaranteed by construction	Requires collision handling
Stability	Permanent, never changes	At risk when title is edited
Readability	None	High
SEO value	Minimal - no keywords	Moderate - keywords in URL
Guessability	Zero	Moderate
Implementation	Simple	Requires transliteration + deduplication

A middle-ground approach that works well for large content platforms: generate a human-readable slug, append the first 8 characters of the record's ULID as a suffix, and never change it regardless of title edits:

/posts/my-post-title-01jpxk3g

The slug is readable, collision-free by construction, and stable because it is tied to the record ID rather than the title.

The Checklist

A good slug is lowercase, hyphen-separated, ASCII-only, bounded in length, and stable over time. The implementation details that matter most in practice are: use ICU transliteration (not naive iconv) for non-Latin scripts, truncate at word boundaries not character boundaries, handle empty-after-strip gracefully, and treat a published slug as immutable. Use 301 redirects when you must change a slug, maintain a blocklist of reserved paths, and if stability is more important than readability, embed a short ID suffix so the slug can survive title edits.

URL Slugs: Rules, SEO Impact, and Transliteration

URL Slugs: Rules, SEO Impact, and Transliteration

The Rules

SEO Specifics

Hyphen vs Underscore - the Google Preference

Slug Stability and PageRank

URL Length

Canonical URLs

Unicode Transliteration

The Algorithm

Implementation

PHP

JavaScript

Python

Edge Cases

All-Numeric Slugs

Empty Result After Stripping

Reserved Words and System Paths

Very Long Titles

Collision Handling

UUID-Based vs Human-Readable Slugs

The Checklist

More Articles

Diceware Passphrases: Why I Stopped Memorising Random Strings

Rich Text to Markdown: How to Convert Google Docs, Word, and Notion Cleanly

HTML, CSS and JavaScript Minification: Complete Guide to Benefits, Risks and Best Practices