Annex Documentation

CRE is a new syntax for regular expressions. Annex is a Python library that implements this syntax.

Start with the Introduction to CREs and Annex if you'd like to learn more.

More documentation/resources:

Shorter pieces of documentation appear inline, below.

Quick Start

  1. Download tarball, or use pip / easy_install (I presume they grab from PyPI.)

  2. Test that it works.

$ python
...
>>> import annex
>>> r = annex.Regex('digit+')
>>> print r.match('weight: 123 lbs')
'123'

Get the Source Code

You can download a stable release from PyPI, or get the latest code from Google Code.

There's also a Github mirror.

Tips for using Annex

Unicode

Annex behaves just like Python's re module with regard to unicode. The pattern and the string being matched can be both be either bytes or unicode.

Unlike Python REs, CREs are always representable in pure ASCII, since you can use a code point like &201c to represent a unicode character.

Here's a CRE that matches smart quotes:

annex.Regex("  &201c {any+} &201d  ")

The Python alternative would be:

annex.Regex(ru"  '\u201c' {any+} '\u201d'  ")

The outer double quotes are for Python; the inner single quotes are CRE syntax. The backslash escapes are Python syntax.

Further Work

Here are some ideas for projects that continue this line of thinking.


Last modified: 2016-07-09 14:19:28 -0700