How to Use Regular Expressions (Regex): Complete Developer Guide

Q: What is the difference between * + and ? in regex?

The asterisk (*) means 0 or more repetitions of the preceding element: 'ab*c' matches 'ac', 'abc', 'abbc'. The plus sign (+) means 1 or more repetitions: 'ab+c' matches 'abc', 'abbc' but NOT 'ac'. The question mark (?) means 0 or 1 repetition (optional): 'colou?r' matches 'color' and 'colour'.

Q: How do I validate an email with a regular expression?

A practical pattern is: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. This validates the basic structure: local part with allowed characters, at sign, domain, and TLD of at least 2 letters. However, the full email specification (RFC 5322) is extremely complex. For production, complement regex validation with server-side verification.

Q: What do \d \w and \s mean in regex?

They are predefined character classes (shorthand). \d equals [0-9] (any digit). \w equals [a-zA-Z0-9_] (any alphanumeric character plus underscore). \s matches any whitespace (space, tab, newline). Their uppercase versions (\D, \W, \S) are the negation: any character that is NOT a digit, word character, or whitespace, respectively.

Q: What is a lookahead and what is it for?

A lookahead is an assertion that checks whether a pattern exists after the current position without consuming characters. Positive lookahead (?=pattern) verifies the pattern DOES exist. Negative (?!pattern) verifies it does NOT exist. They are useful for complex validations like passwords: ^(?=.*[A-Z])(?=.*\d).{8,}$ verifies at least one uppercase and one digit regardless of position.

Q: What is catastrophic backtracking and how do I avoid it?

Catastrophic backtracking occurs when an ambiguous pattern causes the regex engine to try an exponential number of combinations. Example: (a+)+$ with the input 'aaaaaaaaaaab'. The engine tries every way to split the 'a's between the two quantifiers before failing. To avoid it: be specific in your patterns, avoid nested quantifiers like (a+)+, and use possessive quantifiers (a++) or atomic groups when available.

Q: Does regex work the same in all programming languages?

Not exactly. While the basic syntax is similar, each language has its own regex 'flavor'. JavaScript does not support variable-length lookbehind. Python has the re module and the more advanced regex module. Java requires double escaping (\\d instead of \d). PCRE (PHP, Perl) supports recursion and conditionals. The most common differences are in lookahead/lookbehind, Unicode support, and advanced features like recursion.

Published on March 16, 202612 min read

Learn regular expressions from scratch. Basic syntax, character classes, quantifiers, groups, lookahead, lookbehind and common patterns for email, phone, URL and IP. With practical examples.

What are regular expressions and what are they for

Regular expressions (regex or regexp) are search patterns that allow you to find, validate, and manipulate text with extreme precision. They are a fundamental tool in programming, system administration, and data processing.

A regular expression is essentially a sequence of characters that defines a search pattern. With a single line of regex, you can accomplish what would otherwise require dozens of lines of conditional code.

Common uses for regular expressions:

Data validation: Verifying that an email, phone number, URL, or postal code has the correct format
Search and replace: Finding patterns in long texts and replacing them (as in text editors or IDEs)
Data extraction: Pulling specific information from unstructured text (web scraping, logs)
Log parsing: Analyzing server and application log files
Linting and formatting: Verifying that code follows certain conventions
Routing in web frameworks: Defining URL patterns in Express, Django, Rails, etc.

Regex is available in virtually every programming language: JavaScript, Python, Java, C#, PHP, Ruby, Go, Rust, and many more. They are also used in command-line tools like grep, sed, and awk.

If you want to practice while reading this guide, open our regex testing tool in another tab.

Basic syntax: literal characters and metacharacters

Regex syntax is divided into two types of characters: literals (which are matched as-is) and metacharacters (which have special meaning).

Literal characters:

Letters, numbers, and most symbols are matched literally. The pattern cat finds the word "cat" in the text.

The fundamental metacharacters:

Metacharacter	Meaning	Example	Matches
`.`	Any character (except newline)	`c.t`	"cat", "cot", "c3t"
`^`	Start of line/string	`^Hello`	"Hello world" (only at start)
`$`	End of line/string	`world$`	"Hello world" (only at end)
`*`	0 or more repetitions	`ab*c`	"ac", "abc", "abbc", "abbbc"
`+`	1 or more repetitions	`ab+c`	"abc", "abbc" (not "ac")
`?`	0 or 1 repetition (optional)	`colou?r`	"color", "colour"
`\|`	Alternative (OR)	`cat\|dog`	"cat" or "dog"
`\`	Escape (literal next character)	`\.`	A literal period

Escaping metacharacters:

If you need to search for a metacharacter as a literal character, you must escape it with \. For example:

\. searches for a literal period (not "any character")
\* searches for a literal asterisk
\? searches for a literal question mark
$ and $ search for literal parentheses
\\ searches for a literal backslash

Practical example: To search for the string "price: $9.99" you need: price: \$9\.99

Character classes and predefined classes

Character classes allow you to define a set of characters that are valid at a specific position in the pattern.

Custom classes with brackets [ ]:

Pattern	Meaning	Match example
`[abc]`	Any of: a, b, or c	"a", "b", "c"
`[a-z]`	Any lowercase letter	"a", "m", "z"
`[A-Z]`	Any uppercase letter	"A", "M", "Z"
`[0-9]`	Any digit	"0", "5", "9"
`[a-zA-Z]`	Any letter	"a", "Z", "m"
`[a-zA-Z0-9]`	Any alphanumeric	"a", "3", "Z"
`[^abc]`	Any EXCEPT a, b, c	"d", "1", "Z"
`[^0-9]`	Any that is NOT a digit	"a", "!", " "

Predefined classes (shorthand):

Predefined classes are shortcuts for common combinations:

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character (alphanumeric + underscore)
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[\t\n\r\f\v ]`	Any whitespace
`\S`	`[^\t\n\r\f\v ]`	Any non-whitespace
`\b`	(no direct equivalent)	Word boundary

Word boundary (\b):

\b is especially useful for matching whole words. \bcat\b finds "cat" but NOT "caterpillar" or "scat". It is a position anchor that does not consume characters.

Practical example: To validate that a string contains only letters, numbers, and hyphens: ^[a-zA-Z0-9-]+$

Quantifiers and repetition modifiers

Quantifiers specify how many times the preceding element must appear.

Basic quantifiers:

Quantifier	Meaning	Example	Matches
`*`	0 or more times	`\d*`	"", "5", "123", "99999"
`+`	1 or more times	`\d+`	"5", "123", "99999" (not "")
`?`	0 or 1 time	`-?\d+`	"42", "-42"
`{n}`	Exactly n times	`\d{4}`	"2026", "1234" (exactly 4 digits)
`{n,}`	n or more times	`\d{2,}`	"12", "123", "1234" (2+ digits)
`{n,m}`	Between n and m times	`\d{2,4}`	"12", "123", "1234" (2 to 4 digits)

Greedy vs lazy quantifiers:

By default, quantifiers are greedy: they try to match as much text as possible. By adding ? after the quantifier, they become lazy: they match as little as possible.

Example of the difference:

Text: Hello and World

.* (greedy): matches Hello and World (everything)
.*? (lazy): matches Hello and World (separately)

The difference is crucial when working with HTML, XML, or any text with repeated delimiters. Lazy mode is almost always what you want when searching for pairs of tags or delimiters.

Practical example: Validating a US ZIP code with optional dash and 4-digit extension: ^\d{5}(-\d{4})?$

This matches "12345" and "12345-6789" but not "1234" or "123456".

Capture groups and references

Groups allow you to group parts of a pattern, capture matching text for later use, and apply quantifiers to complete subexpressions.

Types of groups:

Syntax	Type	Description
`(pattern)`	Capture group	Groups and captures the matched text
`(?:pattern)`	Non-capture group	Groups but does NOT capture (more efficient)
`(?<name>pattern)`	Named group	Captures with an identifiable name
`\1, \2`	Backreference	Reference to text captured by group 1, 2, etc.

Basic capture group:

To extract the year, month, and day from a date in YYYY-MM-DD format:

(\d{4})-(\d{2})-(\d{2})

Group 1: Year (e.g., "2026")
Group 2: Month (e.g., "03")
Group 3: Day (e.g., "16")

In JavaScript: "2026-03-16".match(/(\d{4})-(\d{2})-(\d{2})/) returns an array where [1] is "2026", [2] is "03", and [3] is "16".

Named groups:

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In JavaScript: match.groups.year, match.groups.month, match.groups.day

Backreferences:

Allow you to refer to the exact text captured by a previous group:

(\w+)\s+\1 finds duplicate words ("the the", "is is")
(['"])(.*?)\1 finds text between quotes, ensuring opening and closing quotes match

Alternation within groups:

(https?|ftp):// matches "http://", "https://", or "ftp://"

Try these patterns in real-time with our regex tool.

Lookahead and lookbehind: position assertions

Lookahead and lookbehind are assertions that check whether a pattern exists before or after the current position, without consuming characters. They are extremely powerful for complex validations.

The 4 types of assertions:

Syntax	Name	Meaning
`(?=pattern)`	Positive lookahead	What follows MUST match the pattern
`(?!pattern)`	Negative lookahead	What follows must NOT match the pattern
`(?<=pattern)`	Positive lookbehind	What precedes MUST match the pattern
`(?<!pattern)`	Negative lookbehind	What precedes must NOT match the pattern

Example 1: Validate a strong password

A password requiring at least one uppercase, one lowercase, one digit, and 8+ characters:

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$

(?=.*[A-Z]): Positive lookahead - must contain at least one uppercase letter
(?=.*[a-z]): Must contain at least one lowercase letter
(?=.*\d): Must contain at least one digit
.{8,}: Must be 8 or more characters

Example 2: Find prices without the currency symbol

(?<=\$)\d+\.\d{2}

In the text "The price is $29.99 and shipping is $5.00", it captures "29.99" and "5.00" but not the "$".

Example 3: Find words NOT followed by a certain pattern

\w+(?!\s*:)

Finds words NOT followed by a colon. Useful for distinguishing between keys and values in "key: value" text.

Example 4: Numbers NOT preceded by a minus sign

(?<!-)\b\d+\b

Finds positive numbers while ignoring negative ones. In "5 -3 8 -12", it finds "5" and "8" but not "3" or "12".

Compatibility note: Lookbehind is not supported in all regex flavors. JavaScript has supported it since ES2018. Python, Java, C#, and .NET fully support it. Some flavors limit lookbehind to fixed-length patterns.

Common patterns: email, phone, URL and IP

Here are tested regular expressions for the most common validation patterns. Remember that no regex is 100% perfect for complex formats like email; for production, complement with server-side validation.

1. Email (practical validation):

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Local part: letters, numbers, periods, hyphens, underscores, %, +
@: mandatory separator
Domain: letters, numbers, periods, hyphens
TLD: at least 2 letters
Valid: user@example.com, first.last@company.co.uk

2. International phone (E.164 format):

^\+?[1-9]\d{1,14}$

Optional + at the start
First digit: 1-9 (cannot start with 0)
Up to 15 digits total
Valid: +12025551234, 447911123456

3. URL (HTTP/HTTPS):

^https?:\/\/[\w.-]+(?:\.[a-zA-Z]{2,})(?:\/[\w.~:/?#\[\]@!$&'()*+,;=-]*)?$

Protocol: http:// or https://
Domain: alphanumeric with periods and hyphens
TLD: at least 2 letters
Path: any valid URL character (optional)

4. IPv4 address:

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

4 octets separated by periods
Each octet: 0-255
Valid: 192.168.1.1, 10.0.0.1, 255.255.255.0
Rejects: 256.1.1.1, 192.168.1.999

5. ISO date format (YYYY-MM-DD):

^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$

Year: 4 digits
Month: 01-12
Day: 01-31
Does not validate impossible days (like 02-30); for that you need additional logic

6. CSS color hex:

^#(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$

Formats: #RGB, #RRGGBB, #RRGGBBAA
Valid: #fff, #FF5733, #FF573380

Test and refine all these patterns in our regex testing tool. To validate JSON data containing these patterns, use our JSON validator.

Flags, performance and best practices

To master regex, you need to know about flags (modifiers) and follow best practices that prevent performance and maintainability issues.

Common flags:

Flag	Name	Effect
`g`	Global	Find ALL matches, not just the first
`i`	Case insensitive	Does not distinguish uppercase from lowercase
`m`	Multiline	^ and $ match start/end of each LINE, not just the string
`s`	Dotall	The dot (.) also matches newlines
`u`	Unicode	Full Unicode character support

In JavaScript: /pattern/flags - example: /hello world/gi

In Python: re.compile(r'pattern', re.IGNORECASE | re.MULTILINE)

Performance best practices:

Avoid catastrophic backtracking: Patterns like (a+)+$ can cause the regex engine to try an exponential number of combinations. This is known as ReDoS (Regular Expression Denial of Service). Use possessive quantifiers (++) or atomic groups when available
Be specific: [a-z]+ is better than .+ when you only expect letters. The more specific your pattern, the faster it runs
Use anchors: ^ and $ tell the engine where to start and stop, avoiding unnecessary searching
Prefer non-capturing groups: (?:...) is more efficient than (...) when you do not need to capture
Compile the regex if used repeatedly: In Python use re.compile(), in Java use Pattern.compile()

Maintainability best practices:

Comment your regex: In Python you can use the re.VERBOSE flag to add readable comments and whitespace
Split complex patterns: Instead of one monolithic regex, break it into parts and combine them programmatically
Write tests: Always write test cases for both positive and negative matches
Do not use regex for everything: For parsing HTML or XML, use a dedicated parser. For JSON, use JSON.parse(). Regex is not suitable for languages with nesting

Practice and refine your patterns with our regex testing tool, which shows matches in real-time and explains each part of the pattern.

Try this tool:

Open tool→

Frequently asked questions

What is the difference between * + and ? in regex?