CRE Examples

Table of Contents

IP Address

This is the short example in the intro. The subexpression improves readability.

CRE: Traditional:
D     = digit^(1..3)
Start = D '.' D '.' D '.' D
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: Does not match:
192.168.0.1
0.0.0.0
ip: 192.168.0.123 ...
0.0.0.a
192.168.012345.1

C-style syntax for a number

CRE:
Sign  = chars[&hyphen +]
Start = { Sign? }
        { either digit+ ('.' digit*)? or '.' digit+ }
        ( chars[e E] { Sign? } { digit+ } )?
Traditional:
([\-+]?)(\d+(?:\.\d*)?|\.\d+)(?:[eE]([\-+]?)(\d+))?
Matches: Does not match:

    

    

This pattern captures 4 groups. For example, on the input "1.05e+8", you get: ('', '1.05', '+', '8').

Password

CRE:
%ASSERT( any* digit )               # must contain a digit
%ASSERT( any* chars[a-z] )          # must contain a lower case letter
%ASSERT( any* chars[A-Z] )          # must contain an upper case letter
%ASSERT( any* chars[@ &hash $ %] )  # must contain one of these symbol
any^(6..20)                         # 6 to 20 chars
Traditional:
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[\@#\$\%]).{6,20}
Matches: Does not match:
abCD90@
ab
CD
90
@

The use of assertions is questionable here, since this validation can be done with simple imperative code:

def is_valid_password(s):
   if not re.search(s, r'\d+'):
       return False
   if not re.search(s, r'[a-z]'):
       return False
   if not re.search(s, r'[A-Z]'):
       return False
   return 6 <= len(s) <= 20

Extracting the URL from an HTML Link

This is also discussed in the intro. This will extract the URL out of a string like <a href="http://example.com/foo">.

CRE:
flags(ignorecase)

_     = whitespace*    # optional whitespace
__    = whitespace+    # mandatory whitespace

# double quote, capture non-double quote chars with {}, double quote
DQ    = '"' { !chars["]* } '"'
SQ    = "'" { !chars[']* } "'"           # same for single quotes
NQ    = { !chars[ ' " whitespace > ]+ }  # no quotes, capture the whole thing

# The top level pattern -- now it should be self-explanatory.
Start = '<' _ 'a' __ 'href' _ '=' _ (either DQ or SQ or NQ) _ '>'
Traditional:
(?i)\<\s*a\s+href\s*\=\s*(?:\"([^"]*)\"|\'([^']*)\'|([^'"\s>]+))\s*\>
Matches: Does not match:
<a href="http://example.com/">
< a href = ' http://example.com/ ' >
<a href=http://example.com/ >
<ahref="http://example.com/">

Capturing a URL from free-form text

See this longer explanation.

More

If you have an interesting or important regex, try translating it to CRE syntax.

If it would help others understand CRE, mail it to TODO@ and I will put it up here.

Credit: Many of these regular expressions were adapted through this article.


Last modified: 2013-01-29 10:42:48 -0800