Regular Expressions: Practical Guide with 20 Ready-to-Use Patterns
22 February, 2026 Web
Regular expressions are one of those tools that every developer uses but few truly master. A well-crafted regex can replace 50 lines of string manipulation code. A poorly crafted one can bring your server to its knees. This guide covers the fundamentals and gives you 20 production-ready patterns you can use immediately.
What Are Regular Expressions?
A regular expression (regex) is a sequence of characters that defines a search pattern. At its core, a regex engine scans a string and reports whether the pattern matches, and optionally where and how many times.
Two major flavors exist in practice:
- PCRE (Perl Compatible Regular Expressions) - used by PHP, Python, Ruby, and most modern languages. Supports lookaheads, lookbehinds, named groups, and backreferences.
- POSIX - older standard used in Unix tools like
grep,sed, andawk. Less capable, no lookaheads, but widely available.
When to use regex:
- Validating input format (email, phone, postal code)
- Extracting structured data from unstructured text
- Search-and-replace with patterns
- Parsing log files and configuration formats
When NOT to use regex:
- Parsing HTML or XML (use a proper DOM parser)
- Parsing JSON (use a JSON library)
- Any recursive or deeply nested structure
- When a simple
string containsorsplitwill do the job
Basic Syntax
A regex pattern is a sequence of literals and metacharacters. Literals match themselves - the pattern cat matches the string "cat" exactly.
Metacharacters have special meaning: . * + ? ^ $ { } [ ] | ( ) \
To match a metacharacter literally, escape it with a backslash: \. matches a literal dot, \( matches a literal parenthesis.
Pattern: hello\.world
Matches: "hello.world"
No match: "hello_world"
Character Classes
A character class matches one character from a defined set.
| Syntax | Description | Example |
|---|---|---|
[abc] |
Any of a, b, or c | [aeiou] matches any vowel |
[a-z] |
Range: any lowercase letter | [a-zA-Z] matches any letter |
[^abc] |
Negated: any character NOT in the set | [^0-9] matches any non-digit |
. |
Any character except newline | a.c matches "abc", "a1c" |
Shorthand classes (available in PCRE):
| Class | Equivalent | Description |
|---|---|---|
\d |
[0-9] |
Any digit |
\D |
[^0-9] |
Any non-digit |
\w |
[a-zA-Z0-9_] |
Any word character |
\W |
[^a-zA-Z0-9_] |
Any non-word character |
\s |
[ \t\n\r\f\v] |
Any whitespace |
\S |
[^ \t\n\r\f\v] |
Any non-whitespace |
Anchors
Anchors do not match characters - they match positions in the string.
| Anchor | Position |
|---|---|
^ |
Start of string (or start of line in multiline mode) |
$ |
End of string (or end of line in multiline mode) |
\b |
Word boundary (between \w and \W) |
\B |
Non-word boundary |
Pattern: ^\d{3}$
Matches: "123" (exactly 3 digits, nothing else)
No match: "1234", "abc123"
Pattern: \bcat\b
Matches "cat" in "the cat sat" but not in "concatenate"
Quantifiers
Quantifiers specify how many times a preceding element must match.
| Quantifier | Meaning |
|---|---|
* |
0 or more |
+ |
1 or more |
? |
0 or 1 (optional) |
{n} |
Exactly n times |
{n,} |
n or more times |
{n,m} |
Between n and m times (inclusive) |
Greedy vs. Lazy:
By default, quantifiers are greedy - they match as much as possible. Add ? to make them lazy - they match as little as possible.
Input: "<b>bold</b> and <i>italic</i>"
Greedy: <.+> matches "<b>bold</b> and <i>italic</i>" (entire string)
Lazy: <.+?> matches "<b>", then "</b>", then "<i>", then "</i>"
Groups and Capturing
Parentheses group patterns and capture matched text for later use.
| Syntax | Type | Description |
|---|---|---|
(abc) |
Capturing group | Matches and captures "abc" |
(?:abc) |
Non-capturing group | Matches but does not capture |
(?P<name>abc) |
Named group (PCRE) | Captures into a named reference |
(?<name>abc) |
Named group (ECMA) | Same, JavaScript syntax |
Backreferences let you reference a previously captured group within the same pattern:
Pattern: (\w+)\s+\1
Matches: "hello hello" (the same word repeated)
No match: "hello world"
Non-capturing groups (?:...) are preferred when you need grouping for quantifiers or alternation but do not need to reference the captured value - they are slightly faster and keep group numbering clean.
Alternation
The pipe | acts as an OR operator between alternatives.
Pattern: cat|dog|bird
Matches: "cat", "dog", "bird"
Pattern: gr(a|e)y
Matches: "gray" and "grey"
Order matters. The engine tries alternatives left-to-right and stops at the first match. Put more specific alternatives before more general ones.
Pattern: colou?r|colour
The second alternative "colour" can never match because "colou?r" already covers it.
Better: colour|color or simply colou?r
Lookahead and Lookbehind
Lookarounds are zero-width assertions - they check for a pattern without consuming characters.
| Syntax | Type | Description |
|---|---|---|
(?=...) |
Positive lookahead | Matches if followed by ... |
(?!...) |
Negative lookahead | Matches if NOT followed by ... |
(?<=...) |
Positive lookbehind | Matches if preceded by ... |
(?<!...) |
Negative lookbehind | Matches if NOT preceded by ... |
Practical examples:
\d+(?= dollars)
Matches the number in "100 dollars" but not in "100 euros"
(?<=\$)\d+
Matches digits preceded by a dollar sign: "500" in "$500"
\b\w+\b(?!\s+is)
Matches a word NOT followed by " is"
(?<!\d)\d{4}(?!\d)
Matches exactly 4-digit numbers not adjacent to other digits
Flags / Modifiers
Flags change how the entire pattern is interpreted.
| Flag | Name | Effect |
|---|---|---|
i |
Case insensitive | [a-z] also matches [A-Z] |
g |
Global | Find all matches, not just the first (JS/Python) |
m |
Multiline | ^ and $ match start/end of each line |
s |
Dotall | . matches newlines too |
x |
Extended/Verbose | Allows whitespace and comments in pattern |
In PHP, flags go inside the delimiter: /pattern/im. In Python, they are passed as constants: re.IGNORECASE | re.MULTILINE. In JavaScript, they follow the closing slash: /pattern/gim.
The x flag is especially useful for complex patterns:
$pattern = '/
^ # start of string
(\d{4}) # year
- # separator
(\d{2}) # month
- # separator
(\d{2}) # day
$ # end of string
/x';
20 Practical Patterns
Test these patterns in your browser as you read.
| # | Name | Pattern | Matches |
|---|---|---|---|
| 1 | Email (simple) | ^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$ |
user@example.com |
| 2 | Email (RFC-ish) | ^[a-zA-Z0-9.!#$%&'*+/=?^_\`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}$ |
RFC 5321 subset |
| 3 | URL (http/https) | ^https?://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?$ |
https://example.com/path?q=1 |
| 4 | IPv4 address | ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$ |
192.168.0.1 |
| 5 | IPv6 (simplified) | ^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$ |
2001:0db8:85a3:0000:0000:8a2e:0370:7334 |
| 6 | Date YYYY-MM-DD | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
2026-02-22 |
| 7 | Date DD/MM/YYYY | ^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/\d{4}$ |
22/02/2026 |
| 8 | Time HH:MM:SS | ^([01]\d|2[0-3]):[0-5]\d:[0-5]\d$ |
14:30:00 |
| 9 | ISO 8601 datetime | ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$ |
2026-02-22T10:00:00Z |
| 10 | Phone E.164 | ^\+[1-9]\d{6,14}$ |
+14155552671 |
| 11 | Credit card (basic) | ^\d{13,19}$ |
4111111111111111 (Luhn not checked) |
| 12 | US ZIP code | ^\d{5}(-\d{4})?$ |
90210, 90210-1234 |
| 13 | Hex colour | ^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$ |
#fff, #1a2b3c |
| 14 | Password strength | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$ |
Min 8 chars, upper, lower, digit, special |
| 15 | URL slug | ^[a-z0-9]+(?:-[a-z0-9]+)*$ |
my-article-title |
| 16 | Semantic version | ^\d+\.\d+\.\d+(-[a-zA-Z0-9.]+)?(\+[a-zA-Z0-9.]+)?$ |
1.2.3, 2.0.0-beta.1 |
| 17 | UUID v4 | ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ |
f47ac10b-58cc-4372-a567-0e02b2c3d479 |
| 18 | HTML tag (simple) | <([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>(.*?)</\1> |
<div class="x">text</div> |
| 19 | CIDR notation | ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)/([12]?\d|3[0-2])$ |
192.168.0.0/24 |
| 20 | Markdown link | \[([^\]]+)\]\((https?://[^\)]+)\) |
[text](https://example.com) |
Pattern 4 (IPv4) uses
|as alternation - when testing, the surrounding parentheses ensure correct grouping. Use without escaping the pipe in your regex engine.
Performance Tips
Catastrophic Backtracking
The most dangerous regex mistake is a pattern that causes exponential backtracking. The classic example:
Pattern: (a+)+$
Input: "aaaaaaaaaaaaaaaaaab"
The engine tries every possible way to partition the a characters among the nested groups before concluding there is no match. On a 20-character input this can take seconds; on 30 characters, minutes.
Rules to avoid it:
- Never nest quantifiers over the same character class:
(a+)+,(\w+\s*)+ - Use atomic groups
(?>...)or possessive quantifiers++,*+if your engine supports them (PCRE does) - Prefer character classes over
.when you know what characters to expect - Anchor patterns whenever possible with
^and$
Possessive Quantifiers and Atomic Groups (PCRE)
// Greedy (can backtrack):
/\w+:/
// Possessive (no backtracking - match and keep):
/\w++:/
// Atomic group (equivalent):
/(?>)\w+):/
General Guidelines
- Compile regex once and reuse (in PHP, store in a static variable or a service; in Python, use
re.compile()) - Prefer
\dover[0-9]for readability, but know they differ in Unicode mode - Use non-capturing groups
(?:...)when you do not need the captured value - Test edge cases: empty string, very long input, input that almost matches
Code Examples
PHP
<?php
declare(strict_types=1);
// Email validation
$email = 'user@example.com';
if (preg_match('/^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/', $email)) {
echo 'Valid email';
}
// Extract all URLs from text
$text = 'Visit https://example.com and https://richdevtools.com for tools.';
preg_match_all('/https?:\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/', $text, $matches);
print_r($matches[0]);
// Named groups for date parsing
$date = '2026-02-22';
if (preg_match('/^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$/', $date, $m)) {
echo "Year: {$m['year']}, Month: {$m['month']}, Day: {$m['day']}";
}
// Replace with callback
$result = preg_replace_callback('/\b(\w)(\w*)\b/', function (array $m): string {
return strtoupper($m[1]) . $m[2];
}, 'hello world');
// Result: "Hello World"
Python
import re
# Email validation
email = 'user@example.com'
pattern = re.compile(r'^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$')
if pattern.match(email):
print('Valid email')
# Extract named groups
date = '2026-02-22'
m = re.match(r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})$', date)
if m:
print(f"Year: {m.group('year')}, Month: {m.group('month')}")
# Find all matches with findall
text = 'IP addresses: 192.168.0.1 and 10.0.0.255'
ips = re.findall(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', text)
print(ips) # ['192.168.0.1', '10.0.0.255']
# Substitution
result = re.sub(r'\bfoo\b', 'bar', 'foo foobar foo', flags=re.IGNORECASE)
print(result) # 'bar foobar bar'
JavaScript
// Email validation
const email = 'user@example.com';
const emailRegex = /^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$/;
console.log(emailRegex.test(email)); // true
// Extract all matches (global flag)
const text = 'Prices: $100 and $250 and $1999';
const prices = text.match(/\$\d+/g);
console.log(prices); // ['$100', '$250', '$1999']
// Named groups (ES2018+)
const date = '2026-02-22';
const { groups } = date.match(/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/) ?? {};
console.log(groups); // { year: '2026', month: '02', day: '22' }
// Replace with function
const result = 'hello world'.replace(/\b(\w)/g, c => c.toUpperCase());
console.log(result); // 'Hello World'
Conclusion
Regular expressions reward the time you invest in understanding them. The fundamentals - character classes, quantifiers, groups, and anchors - cover 90% of everyday use cases. Lookaheads and lookbehinds handle the remaining complex scenarios without consuming characters.
The 20 patterns above are starting points. Real-world input is messier than any example - always test with edge cases: leading and trailing whitespace, Unicode characters, very short or very long strings, and inputs designed to exploit greedy backtracking.
Use the Regex Tester to experiment with patterns interactively as you build and debug your expressions.