Regular expressions (regex) are powerful patterns used for matching and manipulating text. Whether you're validating email addresses, extracting data from logs, or searching through code, regex is an essential tool for developers. This beginner-friendly guide teaches you regex fundamentals with practical examples you can use immediately.
What are Regular Expressions?
Regular expressions (regex or regexp) are sequences of characters that define search patterns. They're used for:
- Pattern matching: Find specific text patterns in strings
- Validation: Check if input matches required format (emails, phone numbers)
- Search and replace: Find and modify text efficiently
- Data extraction: Pull specific information from text
- Text parsing: Break down complex strings into components
Basic Regex Syntax
Literal Characters
The simplest regex matches exact characters:
Pattern: cat
Matches: "cat", "category", "scatter"
Does not match: "Cat" (case-sensitive by default)
Metacharacters
Special characters with special meanings:
. Any single character (except newline)
^ Start of string
$ End of string
* 0 or more repetitions
+ 1 or more repetitions
? 0 or 1 repetition (makes preceding optional)
| OR operator
[] Character class (any one character inside)
() Grouping
\ Escape character
Character Classes
[abc] Matches a, b, or c
[a-z] Any lowercase letter
[A-Z] Any uppercase letter
[0-9] Any digit
[a-zA-Z] Any letter
[^abc] NOT a, b, or c (negation)
Predefined Character Classes
\d Any digit [0-9]
\D Any non-digit
\w Any word character [a-zA-Z0-9_]
\W Any non-word character
\s Any whitespace (space, tab, newline)
\S Any non-whitespace
Quantifiers
* 0 or more times
+ 1 or more times
? 0 or 1 time
{n} Exactly n times
{n,} n or more times
{n,m} Between n and m times
Common Regex Patterns
Email Validation
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^ Start of string
[a-zA-Z0-9._%+-]+ Username (letters, numbers, special chars)
@ Literal @ symbol
[a-zA-Z0-9.-]+ Domain name
\. Literal dot (escaped)
[a-zA-Z]{2,} TLD (2+ letters)
$ End of string
Matches: "user@example.com", "test.email@domain.co.uk"
Phone Numbers (US Format)
Pattern: ^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
Matches:
- (123) 456-7890
- 123-456-7890
- 123.456.7890
- 1234567890
URLs
Pattern: ^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
Matches:
- http://example.com
- https://www.example.com
- https://example.com/path?query=value
Dates (MM/DD/YYYY)
Pattern: ^(0[1-9]|1[0-2])\/(0[1-9]|[12][0-9]|3[01])\/\d{4}$
Matches: "01/15/2024", "12/31/2023"
Password Strength
Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requirements:
- At least 8 characters
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character
Hexadecimal Color Codes
Pattern: ^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$
Matches: "#FF5733", "#F00", "FF5733"
IP Addresses (IPv4)
Pattern: ^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Matches: "192.168.1.1", "10.0.0.1", "255.255.255.255"
Regex in Different Languages
JavaScript
// Test if pattern matches
const regex = /^[a-z]+$/;
console.log(regex.test("hello")); // true
console.log(regex.test("Hello")); // false
// Find matches
const text = "Email: user@example.com";
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/;
const match = text.match(emailRegex);
console.log(match[0]); // "user@example.com"
// Replace
const str = "Hello World";
const result = str.replace(/World/, "JavaScript");
console.log(result); // "Hello JavaScript"
// Flags
/pattern/g // Global (find all matches)
/pattern/i // Case-insensitive
/pattern/m // Multiline
Python
import re
# Test if pattern matches
pattern = r'^[a-z]+$'
print(re.match(pattern, "hello")) # Match object
print(re.match(pattern, "Hello")) # None
# Find all matches
text = "Emails: user@example.com, admin@test.com"
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print(emails) # ['user@example.com', 'admin@test.com']
# Replace
result = re.sub(r'World', 'Python', "Hello World")
print(result) # "Hello Python"
# Flags
re.IGNORECASE // Case-insensitive
re.MULTILINE // ^ and $ match line boundaries
re.DOTALL // . matches newline
PHP
// Test if pattern matches
$pattern = '/^[a-z]+$/';
if (preg_match($pattern, "hello")) {
echo "Match found!";
}
// Find matches
$text = "Email: user@example.com";
preg_match('/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/', $text, $matches);
echo $matches[0]; // "user@example.com"
// Replace
$result = preg_replace('/World/', 'PHP', "Hello World");
echo $result; // "Hello PHP"
Advanced Regex Concepts
Lookahead and Lookbehind
(?=...) Positive lookahead
(?!...) Negative lookahead
(?<=...) Positive lookbehind
(?
Capturing Groups
// Extract parts of a date
const dateRegex = /(\d{2})\/(\d{2})\/(\d{4})/;
const match = "12/31/2023".match(dateRegex);
console.log(match[1]); // "12" (month)
console.log(match[2]); // "31" (day)
console.log(match[3]); // "2023" (year)
// Non-capturing group: (?:...)
const regex = /(?:Mr|Mrs|Ms)\. ([A-Z][a-z]+)/;
Backreferences
// Match repeated words
Pattern: \b(\w+)\s+\1\b
Matches: "the the", "hello hello"
// Match HTML tags
Pattern: <([a-z]+)>.*?\1>
Matches: "content", "text
"
Greedy vs. Lazy Matching
Greedy (default): Matches as much as possible
* + {n,}
Lazy: Matches as little as possible
*? +? {n,}?
Example:
Text: "HelloWorld"
Greedy: .*
Matches: "HelloWorld" (entire string)
Lazy: .*?
Matches: "Hello" (first tag only)
Practical Regex Examples
1. Extract All Links from HTML
const html = 'Link';
const regex = /href="([^"]+)"/g;
const links = [...html.matchAll(regex)].map(m => m[1]);
console.log(links); // ["https://example.com"]
2. Validate Credit Card Numbers
// Basic format check (not Luhn algorithm)
const cardRegex = /^\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}$/;
console.log(cardRegex.test("1234-5678-9012-3456")); // true
3. Parse CSV Lines
const csv = 'John,Doe,30,Engineer';
const values = csv.split(/,/);
console.log(values); // ["John", "Doe", "30", "Engineer"]
4. Remove Extra Whitespace
const text = "Hello World !";
const cleaned = text.replace(/\s+/g, ' ').trim();
console.log(cleaned); // "Hello World !"
5. Extract Hashtags from Text
const tweet = "Learning #JavaScript and #Regex today!";
const hashtags = tweet.match(/#\w+/g);
console.log(hashtags); // ["#JavaScript", "#Regex"]
6. Validate Username
// 3-16 characters, alphanumeric and underscore
const usernameRegex = /^[a-zA-Z0-9_]{3,16}$/;
console.log(usernameRegex.test("user_123")); // true
console.log(usernameRegex.test("ab")); // false (too short)
Regex Best Practices
1. Keep It Simple
- Start with simple patterns and build complexity gradually
- Break complex patterns into smaller parts
- Use comments in verbose mode when available
- Consider readability over cleverness
2. Test Thoroughly
- Test with valid and invalid inputs
- Check edge cases (empty strings, special characters)
- Use regex testing tools for development
- Validate against real-world data
3. Performance Considerations
- Avoid catastrophic backtracking (nested quantifiers)
- Use non-capturing groups (?:...) when you don't need the capture
- Be specific with character classes
- Anchor patterns when possible (^ and $)
4. Escape Special Characters
- Use backslash \ to escape metacharacters
- Characters to escape: . * + ? ^ $ { } [ ] ( ) | \
- Example: \. matches literal dot, not any character
5. Use Raw Strings (Python)
# Without raw string (need double backslash)
pattern = "\\d+\\s+\\w+"
# With raw string (cleaner)
pattern = r"\d+\s+\w+"
Common Regex Mistakes
- Forgetting to escape special characters: Use \ before . * + ? etc.
- Greedy matching when lazy is needed: Use *? or +? for lazy matching
- Not anchoring patterns: Use ^ and $ to match entire string
- Catastrophic backtracking: Avoid patterns like (a+)+ or (a*)*
- Overcomplicating patterns: Sometimes string methods are simpler
- Not testing edge cases: Always test with various inputs
- Ignoring case sensitivity: Use case-insensitive flag when needed
- Forgetting global flag: Use /g in JavaScript to find all matches
Regex Tools and Resources
Online Testing Tools
- Regex101: Interactive regex tester with explanations
- RegExr: Learn, build, and test regex patterns
- RegexPal: Simple online regex tester
- Debuggex: Visual regex debugger
Learning Resources
- RegexOne: Interactive regex tutorial
- Regular-Expressions.info: Comprehensive regex documentation
- MDN Web Docs: JavaScript regex reference
- Python re module docs: Python regex documentation
Cheat Sheets
- Keep a regex cheat sheet handy for quick reference
- Bookmark common patterns for reuse
- Build your own pattern library
When NOT to Use Regex
- Parsing HTML/XML: Use proper parsers (BeautifulSoup, DOMParser)
- Complex validation: Sometimes dedicated libraries are better
- Simple string operations: indexOf(), includes() may be clearer
- Performance-critical code: String methods can be faster for simple tasks
- Nested structures: Regex can't parse nested brackets reliably
Conclusion
Regular expressions are powerful tools for text processing and pattern matching. While they can seem intimidating at first, mastering the basics opens up efficient solutions for validation, searching, and data extraction. Start with simple patterns, practice regularly, and gradually build your regex skills.
Remember: regex is a tool, not a solution for everything. Use it when appropriate, but don't force it when simpler string methods would work better. With practice and the right resources, you'll become proficient at crafting regex patterns for your development needs.
Related Tools & Resources
Enhance your text processing workflow:
- Regex Tester - Test and debug regex patterns
- Text Formatter - Format and clean text
- Case Converter - Convert text case
- Word Counter - Analyze text content