Regex Patterns
Regular Expression, or regex in short, is a sequence of characters that specifies a search pattern in text.
This page shows a summary of regex constructs.
Characters
Pattern | Description | Example | Sample Match |
---|---|---|---|
. | Match any character | file..5 | file_15 |
\d | A digit: [0-9] | file_\d\d | file_15 |
\D | Not a digit:[^0-9] | file_\D\D | file_ab |
\w | A word character: [a-zA-Z_0-9] | \w-\w\w\w | A-b_1 |
\W | Not a word character | a\W\W | a-+ |
\s | A whitespace character. | a\sb | a b |
\S | A non-whitespace character | a\Sb | a-b |
\ | Escapes special character | a\.b | a.b |
Character classes Using []
Pattern | Description | Example | Sample Match |
---|---|---|---|
[abc] | a, b, or c | hell[AEIOU] | hellO |
[^abc] | Any character except a, b, or c (negation) | hell[^aeiou] | hell_ |
[x-y] | character in the range from x to y | [A-Z]+ | GREAT |
[^x-y] | character not in the range from x to y | [^A-Z]+ | great |
Quantifiers
Pattern | Description | Example | Sample Match |
---|---|---|---|
+ | One or more | \d+ | 12345 |
? | zero or one | hello!? | hello |
* | Zero or more | AB* | ABBB |
{N} | N times | x-\d{2} | x-12 |
{N1,N2} | N1 to N2 times | x-\d{2,4} | x-123 |
Logical operators and groups
Pattern | Description | Example | Sample Match |
---|---|---|---|
XY | X followed by Y | Hello | Hello |
X|Y | Either X or Y | this is (true|false) | this is true (captures true, see bellow) |
(X) | X, as a capturing group | I ate (\w+) at lunch | I ate carrots at lunch (captures carrots) |
(?:X) | X, as a non-capturing group | this is (?:true|false) | this is true (without capturing true) |
\1 | Contents of Group 1 | r(\w)g\1x | regex |
\2 | Contents of Group 2 | (\d\d)+(\d\d)=\2+\1 | 12+65=65+12 |
Boundary matchers
Pattern | Description | Example | Sample Match |
---|---|---|---|
^ | Start of line. (But when [^inside brackets], it means "not") | ^abc .* | abc (line start) |
$ | End of line | .*? the end$ | this is the end |
\A | Beginning of string | \Aabc[\d\D]* | abc (string start) |
\Z | The end of the input | the end\Z | this is...\n...the end\n |
\G | The end of the previous match | ||
\b | Word boundary | Bob.*\bcat\b | Bob ate the cat |
\B | Not a word boundary | c.*\Bcat\B.* | copycats |
Note
We follow java conventions for regex patterns, see java documentation for more info.