Regular Expressions: Practical Guide with 20 Ready-to-Use Patterns

22 February, 2026 Web

Regular expressions are one of those tools that every developer uses but few truly master. A well-crafted regex can replace 50 lines of string manipulation code. A poorly crafted one can bring your server to its knees. This guide covers the fundamentals and gives you 20 production-ready patterns you can use immediately.

What Are Regular Expressions?

A regular expression (regex) is a sequence of characters that defines a search pattern. At its core, a regex engine scans a string and reports whether the pattern matches, and optionally where and how many times.

Two major flavors exist in practice:

  • PCRE (Perl Compatible Regular Expressions) - used by PHP, Python, Ruby, and most modern languages. Supports lookaheads, lookbehinds, named groups, and backreferences.
  • POSIX - older standard used in Unix tools like grep, sed, and awk. Less capable, no lookaheads, but widely available.

When to use regex:

  • Validating input format (email, phone, postal code)
  • Extracting structured data from unstructured text
  • Search-and-replace with patterns
  • Parsing log files and configuration formats

When NOT to use regex:

  • Parsing HTML or XML (use a proper DOM parser)
  • Parsing JSON (use a JSON library)
  • Any recursive or deeply nested structure
  • When a simple string contains or split will do the job

Basic Syntax

A regex pattern is a sequence of literals and metacharacters. Literals match themselves - the pattern cat matches the string "cat" exactly.

Metacharacters have special meaning: . * + ? ^ $ { } [ ] | ( ) \

To match a metacharacter literally, escape it with a backslash: \. matches a literal dot, \( matches a literal parenthesis.

Pattern: hello\.world
Matches: "hello.world"
No match: "hello_world"

Character Classes

A character class matches one character from a defined set.

Syntax Description Example
[abc] Any of a, b, or c [aeiou] matches any vowel
[a-z] Range: any lowercase letter [a-zA-Z] matches any letter
[^abc] Negated: any character NOT in the set [^0-9] matches any non-digit
. Any character except newline a.c matches "abc", "a1c"

Shorthand classes (available in PCRE):

Class Equivalent Description
\d [0-9] Any digit
\D [^0-9] Any non-digit
\w [a-zA-Z0-9_] Any word character
\W [^a-zA-Z0-9_] Any non-word character
\s [ \t\n\r\f\v] Any whitespace
\S [^ \t\n\r\f\v] Any non-whitespace

Anchors

Anchors do not match characters - they match positions in the string.

Anchor Position
^ Start of string (or start of line in multiline mode)
$ End of string (or end of line in multiline mode)
\b Word boundary (between \w and \W)
\B Non-word boundary
Pattern: ^\d{3}$
Matches: "123" (exactly 3 digits, nothing else)
No match: "1234", "abc123"

Pattern: \bcat\b
Matches "cat" in "the cat sat" but not in "concatenate"

Quantifiers

Quantifiers specify how many times a preceding element must match.

Quantifier Meaning
* 0 or more
+ 1 or more
? 0 or 1 (optional)
{n} Exactly n times
{n,} n or more times
{n,m} Between n and m times (inclusive)

Greedy vs. Lazy:

By default, quantifiers are greedy - they match as much as possible. Add ? to make them lazy - they match as little as possible.

Input: "<b>bold</b> and <i>italic</i>"

Greedy:  <.+>   matches "<b>bold</b> and <i>italic</i>" (entire string)
Lazy:    <.+?>  matches "<b>", then "</b>", then "<i>", then "</i>"

Groups and Capturing

Parentheses group patterns and capture matched text for later use.

Syntax Type Description
(abc) Capturing group Matches and captures "abc"
(?:abc) Non-capturing group Matches but does not capture
(?P<name>abc) Named group (PCRE) Captures into a named reference
(?<name>abc) Named group (ECMA) Same, JavaScript syntax

Backreferences let you reference a previously captured group within the same pattern:

Pattern: (\w+)\s+\1
Matches: "hello hello" (the same word repeated)
No match: "hello world"

Non-capturing groups (?:...) are preferred when you need grouping for quantifiers or alternation but do not need to reference the captured value - they are slightly faster and keep group numbering clean.


Alternation

The pipe | acts as an OR operator between alternatives.

Pattern: cat|dog|bird
Matches: "cat", "dog", "bird"

Pattern: gr(a|e)y
Matches: "gray" and "grey"

Order matters. The engine tries alternatives left-to-right and stops at the first match. Put more specific alternatives before more general ones.

Pattern: colou?r|colour
The second alternative "colour" can never match because "colou?r" already covers it.
Better: colour|color  or simply  colou?r

Lookahead and Lookbehind

Lookarounds are zero-width assertions - they check for a pattern without consuming characters.

Syntax Type Description
(?=...) Positive lookahead Matches if followed by ...
(?!...) Negative lookahead Matches if NOT followed by ...
(?<=...) Positive lookbehind Matches if preceded by ...
(?<!...) Negative lookbehind Matches if NOT preceded by ...

Practical examples:

\d+(?= dollars)
Matches the number in "100 dollars" but not in "100 euros"

(?<=\$)\d+
Matches digits preceded by a dollar sign: "500" in "$500"

\b\w+\b(?!\s+is)
Matches a word NOT followed by " is"

(?<!\d)\d{4}(?!\d)
Matches exactly 4-digit numbers not adjacent to other digits

Flags / Modifiers

Flags change how the entire pattern is interpreted.

Flag Name Effect
i Case insensitive [a-z] also matches [A-Z]
g Global Find all matches, not just the first (JS/Python)
m Multiline ^ and $ match start/end of each line
s Dotall . matches newlines too
x Extended/Verbose Allows whitespace and comments in pattern

In PHP, flags go inside the delimiter: /pattern/im. In Python, they are passed as constants: re.IGNORECASE | re.MULTILINE. In JavaScript, they follow the closing slash: /pattern/gim.

The x flag is especially useful for complex patterns:

$pattern = '/
    ^               # start of string
    (\d{4})         # year
    -               # separator
    (\d{2})         # month
    -               # separator
    (\d{2})         # day
    $               # end of string
/x';

20 Practical Patterns

Test these patterns in your browser as you read.

# Name Pattern Matches
1 Email (simple) ^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$ user@example.com
2 Email (RFC-ish) ^[a-zA-Z0-9.!#$%&'*+/=?^_\`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}$ RFC 5321 subset
3 URL (http/https) ^https?://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?$ https://example.com/path?q=1
4 IPv4 address ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$ 192.168.0.1
5 IPv6 (simplified) ^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$ 2001:0db8:85a3:0000:0000:8a2e:0370:7334
6 Date YYYY-MM-DD ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ 2026-02-22
7 Date DD/MM/YYYY ^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/\d{4}$ 22/02/2026
8 Time HH:MM:SS ^([01]\d|2[0-3]):[0-5]\d:[0-5]\d$ 14:30:00
9 ISO 8601 datetime ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$ 2026-02-22T10:00:00Z
10 Phone E.164 ^\+[1-9]\d{6,14}$ +14155552671
11 Credit card (basic) ^\d{13,19}$ 4111111111111111 (Luhn not checked)
12 US ZIP code ^\d{5}(-\d{4})?$ 90210, 90210-1234
13 Hex colour ^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$ #fff, #1a2b3c
14 Password strength ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$ Min 8 chars, upper, lower, digit, special
15 URL slug ^[a-z0-9]+(?:-[a-z0-9]+)*$ my-article-title
16 Semantic version ^\d+\.\d+\.\d+(-[a-zA-Z0-9.]+)?(\+[a-zA-Z0-9.]+)?$ 1.2.3, 2.0.0-beta.1
17 UUID v4 ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ f47ac10b-58cc-4372-a567-0e02b2c3d479
18 HTML tag (simple) <([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>(.*?)</\1> <div class="x">text</div>
19 CIDR notation ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)/([12]?\d|3[0-2])$ 192.168.0.0/24
20 Markdown link \[([^\]]+)\]\((https?://[^\)]+)\) [text](https://example.com)

Pattern 4 (IPv4) uses | as alternation - when testing, the surrounding parentheses ensure correct grouping. Use without escaping the pipe in your regex engine.


Performance Tips

Catastrophic Backtracking

The most dangerous regex mistake is a pattern that causes exponential backtracking. The classic example:

Pattern: (a+)+$
Input:   "aaaaaaaaaaaaaaaaaab"

The engine tries every possible way to partition the a characters among the nested groups before concluding there is no match. On a 20-character input this can take seconds; on 30 characters, minutes.

Rules to avoid it:

  • Never nest quantifiers over the same character class: (a+)+, (\w+\s*)+
  • Use atomic groups (?>...) or possessive quantifiers ++, *+ if your engine supports them (PCRE does)
  • Prefer character classes over . when you know what characters to expect
  • Anchor patterns whenever possible with ^ and $

Possessive Quantifiers and Atomic Groups (PCRE)

// Greedy (can backtrack):
/\w+:/

// Possessive (no backtracking - match and keep):
/\w++:/

// Atomic group (equivalent):
/(?>)\w+):/

General Guidelines

  • Compile regex once and reuse (in PHP, store in a static variable or a service; in Python, use re.compile())
  • Prefer \d over [0-9] for readability, but know they differ in Unicode mode
  • Use non-capturing groups (?:...) when you do not need the captured value
  • Test edge cases: empty string, very long input, input that almost matches

Code Examples

PHP

<?php
declare(strict_types=1);

// Email validation
$email = 'user@example.com';
if (preg_match('/^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/', $email)) {
    echo 'Valid email';
}

// Extract all URLs from text
$text = 'Visit https://example.com and https://richdevtools.com for tools.';
preg_match_all('/https?:\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/', $text, $matches);
print_r($matches[0]);

// Named groups for date parsing
$date = '2026-02-22';
if (preg_match('/^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$/', $date, $m)) {
    echo "Year: {$m['year']}, Month: {$m['month']}, Day: {$m['day']}";
}

// Replace with callback
$result = preg_replace_callback('/\b(\w)(\w*)\b/', function (array $m): string {
    return strtoupper($m[1]) . $m[2];
}, 'hello world');
// Result: "Hello World"

Python

import re

# Email validation
email = 'user@example.com'
pattern = re.compile(r'^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$')
if pattern.match(email):
    print('Valid email')

# Extract named groups
date = '2026-02-22'
m = re.match(r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$', date)
if m:
    print(f"Year: {m.group('year')}, Month: {m.group('month')}")

# Find all matches with findall
text = 'IP addresses: 192.168.0.1 and 10.0.0.255'
ips = re.findall(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', text)
print(ips)  # ['192.168.0.1', '10.0.0.255']

# Substitution
result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo', flags=re.IGNORECASE)
print(result)  # 'bar foobar bar'

JavaScript

// Email validation
const email = 'user@example.com';
const emailRegex = /^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/;
console.log(emailRegex.test(email)); // true

// Extract all matches (global flag)
const text = 'Prices: $100 and $250 and $1999';
const prices = text.match(/\$\d+/g);
console.log(prices); // ['$100', '$250', '$1999']

// Named groups (ES2018+)
const date = '2026-02-22';
const { groups } = date.match(/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/) ?? {};
console.log(groups); // { year: '2026', month: '02', day: '22' }

// Replace with function
const result = 'hello world'.replace(/\b(\w)/g, c => c.toUpperCase());
console.log(result); // 'Hello World'

Conclusion

Regular expressions reward the time you invest in understanding them. The fundamentals - character classes, quantifiers, groups, and anchors - cover 90% of everyday use cases. Lookaheads and lookbehinds handle the remaining complex scenarios without consuming characters.

The 20 patterns above are starting points. Real-world input is messier than any example - always test with edge cases: leading and trailing whitespace, Unicode characters, very short or very long strings, and inputs designed to exploit greedy backtracking.

Use the Regex Tester to experiment with patterns interactively as you build and debug your expressions.

More Articles

UUID Versions Explained: v1, v3, v4, v5, v6, and v7

A complete technical breakdown of all UUID versions. Covers time-based, name-based, and random UUIDs, with code examples in PHP, Python, and JavaScript, and a practical guide to choosing the right version.

28 February, 2026

Password Security and Entropy: Why Length Beats Complexity

A technical guide to password entropy for developers. Covers entropy calculation, character sets, passphrases vs random strings, brute force and rainbow table attacks, and secure password generation.

26 February, 2026

RAG Document Assistant: Answer Questions from Your Own Docs with Ollama, ChromaDB and Docker

Build a local RAG document assistant that reads .txt files, indexes them with vector embeddings, and answers questions using a local LLM — all without a cloud API. Includes a FastAPI backend, a minimal browser UI, and a full Docker Compose setup.

26 February, 2026