Home Page : Glossary : "R" : regular expression

www.cryer.co.uk
Brian Cryer's Web Resources

regular expression

regular expression
A regular expression is a way of expressing a text pattern for the purpose of matching a string. Regular expressions are often used either to extract information from a string or to verify that a string is of the correct format. When referred to, regular expressions are often abbreviated to simply "regex".

The following are examples of regular expressions:

Regex Meaning
. A dot matches any single character.
c.t Matches "cat", "cbt" ... "c1t" ... "c&t" etc (with anything before the "c" and anything after the "t").
[0-9] Matches any single character in the range 0 to 9.
[09] Matches the single character 0 or 9 (but not 1 to 8). This can be useful if you don't know the case of something, because [aA] will match either a or A.
[0-9a-zA-Z] Matches any single character in the range 0 to 9 or a to z or A to Z.
? Matches any single character.
* Matches the preceding element zero or more times.
a[0-9]*z Will match against "az", "a1z" ... "a999z" etc.
+ Matches the preceding element one or more times.
a[0-9]+z Will match against "a1z" ... "a999z" etc (but not "az").
{number} Matches the preceding element the specified number of times.
a[0-1]{2}z Will match against "a00z", "a10z", "a01z" and "a11z".
{min,max} Matches the preceding element a minimum of "min" times at at most "max" times.
a[0-1]{1,2}z Will match against "a0z", "a1z", "a00z", "a10z", "a01z" and "a11z".
[^...] Inverts a match - matches anything except.
a[^0-1]z Will match against any three letter string starting with "a", ending with "z" that is not "a0z" or "a1z".
^ Matches at the start of the line.
$ Matches at the end of the line.
\ Escape character. Allows the following character to be treated as a literal rather than having a special meaning, thus \+ matches against + rather than + having its normal meaning.
\A Matches at the start of the string. When dealing with a single line expression \A is equivalent to ^.
\Z Matches at the end of the string. When dealing with single line expressions \Z is equivalent to $.
\xNN Will match against the single character with the hex code 'NN'. So \0x09 will match against a tab character and \0x20 will match against a space.
\t Match against the TAB character (same as \x09).
\n Match against a new-line (same as \x0a).
\r Match against a carriage return (same as \x0d).
\f Match against a form-feed (same as \x0c).
\a Match against a bell character (same as \x07).
\e Match against an escape character (same as \x16)
\r\n Match against a carriage return line feed combination (in that order).
\w Match against any alphanumeric character (i.e. 0 to 9, a to z, A to Z and the underscore).
\W Match against any non-alphanumeric character.
\d Match against any numeric character.
\D Match against any non-numeric character.
\s Any space (same as [ \t\n\r\f]).
\S Match against any non-space.

Relevant links:

Can you add to this definition? If so please Report an Observation. Do you know of a relevant link to add under this definition? If so please Add a Link.