Regular Expressions: Practical Guide with 20 Ready-to-Use Patterns

22 February, 2026 Web

Regular expressions are one of those tools that every developer uses but few truly master. A well-crafted regex can replace 50 lines of string manipulation code. A poorly crafted one can bring your server to its knees. This guide covers the fundamentals and gives you 20 production-ready patterns you can use immediately.

What Are Regular Expressions?

A regular expression (regex) is a sequence of characters that defines a search pattern. At its core, a regex engine scans a string and reports whether the pattern matches, and optionally where and how many times.

Two major flavors exist in practice:

PCRE (Perl Compatible Regular Expressions) - used by PHP, Python, Ruby, and most modern languages. Supports lookaheads, lookbehinds, named groups, and backreferences.
POSIX - older standard used in Unix tools like grep, sed, and awk. Less capable, no lookaheads, but widely available.

When to use regex:

Validating input format (email, phone, postal code)
Extracting structured data from unstructured text
Search-and-replace with patterns
Parsing log files and configuration formats

When NOT to use regex:

Parsing HTML or XML (use a proper DOM parser)
Parsing JSON (use a JSON library)
Any recursive or deeply nested structure
When a simple string contains or split will do the job

Basic Syntax

A regex pattern is a sequence of literals and metacharacters. Literals match themselves - the pattern cat matches the string "cat" exactly.

Metacharacters have special meaning: . * + ? ^ $ { } [ ] | ( ) \

To match a metacharacter literally, escape it with a backslash: \. matches a literal dot, \( matches a literal parenthesis.

Pattern: hello\.world
Matches: "hello.world"
No match: "hello_world"

Character Classes

A character class matches one character from a defined set.

Syntax	Description	Example
`[abc]`	Any of a, b, or c	`[aeiou]` matches any vowel
`[a-z]`	Range: any lowercase letter	`[a-zA-Z]` matches any letter
`[^abc]`	Negated: any character NOT in the set	`[^0-9]` matches any non-digit
`.`	Any character except newline	`a.c` matches "abc", "a1c"

Shorthand classes (available in PCRE):

Class	Equivalent	Description
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r\f\v]`	Any whitespace
`\S`	`[^ \t\n\r\f\v]`	Any non-whitespace

Anchors

Anchors do not match characters - they match positions in the string.

Anchor	Position
`^`	Start of string (or start of line in multiline mode)
`$`	End of string (or end of line in multiline mode)
`\b`	Word boundary (between `\w` and `\W`)
`\B`	Non-word boundary

Pattern: ^\d{3}$
Matches: "123" (exactly 3 digits, nothing else)
No match: "1234", "abc123"

Pattern: \bcat\b
Matches "cat" in "the cat sat" but not in "concatenate"

Quantifiers

Quantifiers specify how many times a preceding element must match.

Quantifier	Meaning
`*`	0 or more
`+`	1 or more
`?`	0 or 1 (optional)
`{n}`	Exactly n times
`{n,}`	n or more times
`{n,m}`	Between n and m times (inclusive)

Greedy vs. Lazy:

By default, quantifiers are greedy - they match as much as possible. Add ? to make them lazy - they match as little as possible.

Input: "<b>bold</b> and <i>italic</i>"

Greedy:  <.+>   matches "<b>bold</b> and <i>italic</i>" (entire string)
Lazy:    <.+?>  matches "<b>", then "</b>", then "<i>", then "</i>"

Groups and Capturing

Parentheses group patterns and capture matched text for later use.

Syntax	Type	Description
`(abc)`	Capturing group	Matches and captures "abc"
`(?:abc)`	Non-capturing group	Matches but does not capture
`(?P<name>abc)`	Named group (PCRE)	Captures into a named reference
`(?<name>abc)`	Named group (ECMA)	Same, JavaScript syntax

Backreferences let you reference a previously captured group within the same pattern:

Pattern: (\w+)\s+\1
Matches: "hello hello" (the same word repeated)
No match: "hello world"

Non-capturing groups (?:...) are preferred when you need grouping for quantifiers or alternation but do not need to reference the captured value - they are slightly faster and keep group numbering clean.

Alternation

The pipe | acts as an OR operator between alternatives.

Pattern: cat|dog|bird
Matches: "cat", "dog", "bird"

Pattern: gr(a|e)y
Matches: "gray" and "grey"

Order matters. The engine tries alternatives left-to-right and stops at the first match. Put more specific alternatives before more general ones.

Pattern: colou?r|colour
The second alternative "colour" can never match because "colou?r" already covers it.
Better: colour|color  or simply  colou?r

Lookahead and Lookbehind

Lookarounds are zero-width assertions - they check for a pattern without consuming characters.

Syntax	Type	Description
`(?=...)`	Positive lookahead	Matches if followed by ...
`(?!...)`	Negative lookahead	Matches if NOT followed by ...
`(?<=...)`	Positive lookbehind	Matches if preceded by ...
`(?<!...)`	Negative lookbehind	Matches if NOT preceded by ...

Practical examples:

\d+(?= dollars)
Matches the number in "100 dollars" but not in "100 euros"

(?<=\$)\d+
Matches digits preceded by a dollar sign: "500" in "$500"

\b\w+\b(?!\s+is)
Matches a word NOT followed by " is"

(?<!\d)\d{4}(?!\d)
Matches exactly 4-digit numbers not adjacent to other digits

Flags / Modifiers

Flags change how the entire pattern is interpreted.

Flag	Name	Effect
`i`	Case insensitive	`[a-z]` also matches `[A-Z]`
`g`	Global	Find all matches, not just the first (JS/Python)
`m`	Multiline	`^` and `$` match start/end of each line
`s`	Dotall	`.` matches newlines too
`x`	Extended/Verbose	Allows whitespace and comments in pattern

In PHP, flags go inside the delimiter: /pattern/im. In Python, they are passed as constants: re.IGNORECASE | re.MULTILINE. In JavaScript, they follow the closing slash: /pattern/gim.

The x flag is especially useful for complex patterns:

$pattern = '/
    ^               # start of string
    (\d{4})         # year
    -               # separator
    (\d{2})         # month
    -               # separator
    (\d{2})         # day
    $               # end of string
/x';

20 Practical Patterns

Test these patterns in your browser as you read.

#	Name	Pattern	Matches
1	Email (simple)	`^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$`	`user@example.com`
2	Email (RFC-ish)	^[a-zA-Z0-9.!#$%&'+/=?^_\`{\|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)\.[a-zA-Z]{2,}$	RFC 5321 subset
3	URL (http/https)	`^https?://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?$`	`https://example.com/path?q=1`
4	IPv4 address	`^((25[0-5]\|2[0-4]\d\|[01]?\d\d?)\.){3}(25[0-5]\|2[0-4]\d\|[01]?\d\d?)$`	`192.168.0.1`
5	IPv6 (simplified)	`^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$`	`2001:0db8:85a3:0000:0000:8a2e:0370:7334`
6	Date YYYY-MM-DD	`^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$`	`2026-02-22`
7	Date DD/MM/YYYY	`^(0[1-9]\|[12]\d\|3[01])/(0[1-9]\|1[0-2])/\d{4}$`	`22/02/2026`
8	Time HH:MM:SS	`^([01]\d\|2[0-3]):[0-5]\d:[0-5]\d$`	`14:30:00`
9	ISO 8601 datetime	`^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z\|[+-]\d{2}:\d{2})$`	`2026-02-22T10:00:00Z`
10	Phone E.164	`^\+[1-9]\d{6,14}$`	`+14155552671`
11	Credit card (basic)	`^\d{13,19}$`	`4111111111111111` (Luhn not checked)
12	US ZIP code	`^\d{5}(-\d{4})?$`	`90210`, `90210-1234`
13	Hex colour	`^#([0-9a-fA-F]{3}\|[0-9a-fA-F]{6})$`	`#fff`, `#1a2b3c`
14	Password strength	`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$`	Min 8 chars, upper, lower, digit, special
15	URL slug	`^[a-z0-9]+(?:-[a-z0-9]+)*$`	`my-article-title`
16	Semantic version	`^\d+\.\d+\.\d+(-[a-zA-Z0-9.]+)?(\+[a-zA-Z0-9.]+)?$`	`1.2.3`, `2.0.0-beta.1`
17	UUID v4	`^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$`	`f47ac10b-58cc-4372-a567-0e02b2c3d479`
18	HTML tag (simple)	`<([a-zA-Z][a-zA-Z0-9])\b[^>]>(.*?)</\1>`	`<div class="x">text</div>`
19	CIDR notation	`^((25[0-5]\|2[0-4]\d\|[01]?\d\d?)\.){3}(25[0-5]\|2[0-4]\d\|[01]?\d\d?)/([12]?\d\|3[0-2])$`	`192.168.0.0/24`
20	Markdown link	`\[([^\]]+)\]$(https?://[^$]+)\)`	`[text](https://example.com)`

Pattern 4 (IPv4) uses | as alternation - when testing, the surrounding parentheses ensure correct grouping. Use without escaping the pipe in your regex engine.

Performance Tips

Catastrophic Backtracking

The most dangerous regex mistake is a pattern that causes exponential backtracking. The classic example:

Pattern: (a+)+$
Input:   "aaaaaaaaaaaaaaaaaab"

The engine tries every possible way to partition the a characters among the nested groups before concluding there is no match. On a 20-character input this can take seconds; on 30 characters, minutes.

Rules to avoid it:

Never nest quantifiers over the same character class: (a+)+, (\w+\s*)+
Use atomic groups (?>...) or possessive quantifiers ++, *+ if your engine supports them (PCRE does)
Prefer character classes over . when you know what characters to expect
Anchor patterns whenever possible with ^ and $

Possessive Quantifiers and Atomic Groups (PCRE)

// Greedy (can backtrack):
/\w+:/

// Possessive (no backtracking - match and keep):
/\w++:/

// Atomic group (equivalent):
/(?>)\w+):/

General Guidelines

Compile regex once and reuse (in PHP, store in a static variable or a service; in Python, use re.compile())
Prefer \d over [0-9] for readability, but know they differ in Unicode mode
Use non-capturing groups (?:...) when you do not need the captured value
Test edge cases: empty string, very long input, input that almost matches

Code Examples

PHP

<?php
declare(strict_types=1);

// Email validation
$email = 'user@example.com';
if (preg_match('/^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/', $email)) {
    echo 'Valid email';
}

// Extract all URLs from text
$text = 'Visit https://example.com and https://richdevtools.com for tools.';
preg_match_all('/https?:\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/', $text, $matches);
print_r($matches[0]);

// Named groups for date parsing
$date = '2026-02-22';
if (preg_match('/^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$/', $date, $m)) {
    echo "Year: {$m['year']}, Month: {$m['month']}, Day: {$m['day']}";
}

// Replace with callback
$result = preg_replace_callback('/\b(\w)(\w*)\b/', function (array $m): string {
    return strtoupper($m[1]) . $m[2];
}, 'hello world');
// Result: "Hello World"

Python

import re

# Email validation
email = 'user@example.com'
pattern = re.compile(r'^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$')
if pattern.match(email):
    print('Valid email')

# Extract named groups
date = '2026-02-22'
m = re.match(r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$', date)
if m:
    print(f"Year: {m.group('year')}, Month: {m.group('month')}")

# Find all matches with findall
text = 'IP addresses: 192.168.0.1 and 10.0.0.255'
ips = re.findall(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', text)
print(ips)  # ['192.168.0.1', '10.0.0.255']

# Substitution
result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo', flags=re.IGNORECASE)
print(result)  # 'bar foobar bar'

JavaScript

// Email validation
const email = 'user@example.com';
const emailRegex = /^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/;
console.log(emailRegex.test(email)); // true

// Extract all matches (global flag)
const text = 'Prices: $100 and $250 and $1999';
const prices = text.match(/\$\d+/g);
console.log(prices); // ['$100', '$250', '$1999']

// Named groups (ES2018+)
const date = '2026-02-22';
const { groups } = date.match(/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/) ?? {};
console.log(groups); // { year: '2026', month: '02', day: '22' }

// Replace with function
const result = 'hello world'.replace(/\b(\w)/g, c => c.toUpperCase());
console.log(result); // 'Hello World'

Where to Go from Here

Regular expressions reward the time you invest in understanding them. The fundamentals - character classes, quantifiers, groups, and anchors - cover 90% of everyday use cases. Lookaheads and lookbehinds handle the remaining complex scenarios without consuming characters.

The 20 patterns above are starting points. Real-world input is messier than any example - always test with edge cases: leading and trailing whitespace, Unicode characters, very short or very long strings, and inputs designed to exploit greedy backtracking.

Use the Regex Tester to experiment with patterns interactively as you build and debug your expressions.