regular expression
- regular expression
- A regular expression is a way of expressing a text pattern for the
purpose of matching
a string or part of a string. Regular expressions are often used either to extract
information from a string or to verify that a string is of the correct
format. When referred to, regular expressions are often abbreviated to simply "regex".
The following table provides a summary of the different types of regular expressions together with examples of their use:
Regex Meaning . A dot matches any single character - except a carriage return or line feed. c.t Matches "cat", "cbt" ... "c1t" ... "c&t" etc (with anything before the "c" and anything after the "t"). [0-9] Matches anything inside the square brackets against a single character; so in this case any single character in the range 0 to 9. [09] Matches anything inside the square brackets against a single character; in this case the single character 0 or 9 (but not 1 to 8). This can be useful if you don't know the case of something, because [aA]will match either a or A.[0-9a-zA-Z] Matches any single character in the range 0 to 9 or a to z or A to Z. ? Matches the preceding element one time or zero times. This is equivalent to {0,1}. It can be thought of as making the previous element optional.?? Like ?, it matches the previous element 0 or 1 times, but weighted towards matching it 0 times rather than 1 if possible. * Matches the preceding element zero or more times. *? Matches the preceding element zero or more times, but unlike * will match against the shortest possible match.
For example "b[an]*a" when applied to the word "banana" will match the entire word, but "b[an]*?a" will match with just "ban".a[0-9]*z Will match against "az", "a1z" ... "a999z" etc. + Matches the preceding element one or more times. +? Matches the preceding element one or more times, but unlike + will match against the shortest possible substring. a[0-9]+z Will match against "a1z" ... "a999z" etc (but not "az"). {number} Matches the preceding element the specified number of times. a[0-1]{2}z Will match against "a00z", "a10z", "a01z" and "a11z". {min,max} Matches the preceding element a minimum of "min" times at at most "max" times. a[0-1]{1,2}z Will match against "a0z", "a1z", "a00z", "a10z", "a01z" and "a11z". [^...] Inverts a match - matches anything except. a[^0-1]z Will match against any three letter string starting with "a", ending with "z" that is not "a0z" or "a1z". ^ Matches at the start of the line. $ Matches at the end of the line. \ Escape character. Allows the following character to be treated as a literal rather than having a special meaning. Thus
\+would match against+rather than+having its normal meaning. This also means that to include a\as a slash you would need to use\\(to escape the slash).\A Matches at the start of the string. When dealing with a single line expression \Ais equivalent to^.
Note:\Ais not supported on all implementations of regex.\Z Matches at the end of the string. When dealing with single line expressions \Zis equivalent to$.
Note:\Zis not supported on all implementations of regex.\xNN Will match against the single character with the hex code 'NN'. So \x09will match against a tab character and\x20will match against a space.
See www.cryer.co.uk/brian/misc/ascii_table.htm for a list of hex codes for ASCII characters.\t Match against the TAB character (same as \x09).\n Match against a new-line (same as \x0a).\r Match against a carriage return (same as \x0d).\f Match against a form-feed (same as \x0c).\a Match against a bell character (same as \x07).\b Match against a word boundary. A word boundary will match against any of: (i.) The beginning of the string, (ii.) the end of the string or (iii.) anything which is not [a-zA-Z0-9_]\e Match against an escape character (same as \x16)\r\n Match against a carriage return line feed combination (in that order). \w Match against any alphanumeric character (i.e. 0 to 9, a to z, A to Z and the underscore). \W Match against any non-alphanumeric character. \d Match against any numeric character. \D Match against any non-numeric character. \s Any space (same as [ \t\n\r\f]). \S Match against any non-space. one|two Match everything to the left of the bar ('|') or everything to the right, so in this case "one" or "two". ( … ) Brackets define a group or sub-expression. When a regular expression is used in a situation where the contents of a match need to be extracted then brackets define the group or sub-expression the matching value of which can be retrieved. \b(one|two)\b Match against the whole word "one" or the word "two", specifically it matches against a word boundary, then "one" or "two", followed by another word boundary. (?<!a)b Look behind - matches against "b" but only if the previous character was not "a". So " (?<!a)b" will match against "rubble" but not against "table".(?<=a)b Look behind - matches against "b" but only if the previous character was "a". So " (?<!a)b" will match against "table" but not against "rubble"For more information see:
- http://etext.lib.virginia.edu/helpsheets/regex.html - Steve Ramsay's Guide to Regular Expressions.
- www.regexlib.com - Regular expression "library". If you are looking for a regular expression that its likely someone else has already put together then this is a good place to look.
- www.regular-expressions.info/anchors.html - Article on the start and end of line anchors, and the effects of line breaks.
- www.regular-expressions.info/tutorialcnt.html - Regular expression tutorial.
- http://everything.explained.at/Regular_expression/ - Regular expression explained.
- http://msdn.microsoft.com/en-us/library/ms972966.aspx - Regular Expressions in ASP.NET, an article from Microsoft.
- http://regex.cryer.info - Online regular expression evaluation tool. Lets you try out a regular expression online.