Regular Expression Basics

List of Keys

Anchors

at the beginning of line ^

import re
p = re.compile('^T', re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['T']

at the end of the line $

import re
p = re.compile('e$', re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['e']

Character Classes

Printable Characters

any character .

import re
p = re.compile('^T.', re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['Th']

single character of digit \d

word character \w (including alphanumeric character and underscore):

import re
line = "The email address is this this the do you see"
result = re.findall('\w', line)
print(result)
# ['T', 'h', 'e', 'e', 'm', 'a', 'i', 'l', 'a', 'd', 'd', 'r', 'e', 's', 's', 'i', 's', 't', 'h', 'i', 's', 't', 'h', 'i', 's', 't', 'h', 'e', 'd', 'o', 'y', 'o', 'u', 's', 'e', 'e']

whitespace \s

import re
line = "The email address is this this the do you see"
result = re.findall('\sth[e|i]', line)
print(result)
# [' thi', ' thi', ' the']

Non-printable Characters

tabs \t
new line \n
carriage return \r

Capitalization

non-digit \D
non-word \W
non-blank character \S

Quantifiers

0 or more times *

import re
p = re.compile('^T\w*', re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['The']

1 or more times +
n times {n}
n1 to n2 times {n1,n2}

n or more times {n,}

import re
p = re.compile('(?:d){2,}', re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['dd']

Flags

Regex comes with several flags that can be used to define the way of searching. In the python re module, it is done with options of compile function.

ignoring cases i: re.I in python

import re
p = re.compile('the',re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['The', 'the']

multiline m: re.M in python
global g
Python re module also provides some other flags.¹

Greedy Search

Don’t be greedy ?: regex matches the longest strings of the pattern without ?

import re
p1 = re.compile('^T.*e', re.I)
p2 = re.compile('^T.*?e', re.I)
line = "The email address is this this the do you see"
result1 = p1.findall(line)
result2 = p2.findall(line)
print(result1)
# ['The email address is this this the do you see']
print(result2)
# ['The']

Grouping and Capturing

capturing (): matches the whole expression even with keys outside of the parenthesis but returns only the part inside ()

import re
p = re.compile('th(e|i)',re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['e', 'i', 'i', 'e']

grouping (?:): ?: disables the capturing so that the parenthesis indicates only grouping

import re
p = re.compile('th(?:e|i)',re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['The', 'thi', 'thi', 'the']

either character []: [aeiou], [a-z]

import re
p = re.compile('th[ei]',re.I)
line = "The email address is this this the do you see"
result = p.findall(line)
print(result)
# ['The', 'thi', 'thi', 'the']

group name T(?<groupname>he)
referencing nth group \n
referencing group by name \k<groupname>

Special Characters

escape \: is used to escape some special characters

Boundaries

boundaries of words \b (depends on the locale)

Python

compile
search
findall
match
…

Useful expressions

^X-.*:: X-`` is at the beginning of the line, followed by 0 or more characters and :`
^X-\S+:: X- is at the beginning of the line, followed by 1 or more non-blank characters and :
^X-\S+?:: ? means “don’t be greedy”
\S+@\S+: finds email addresses
^Email (\S+@\S+): finds the pattern but returns only the part in () which should be the email address
[^ ]: not space; ^ means not
[a-zA-Z0-9] means all the letters and numbers
[^a-zA-Z0-9] means neither letters nor numbers

References

Module Contents@Python3 Documentation ↩︎

Planted: 2018-06-20 by L Ma;

References:

Dynamic Backlinks to wiki/sugar/regular-experssions: