Python Compile Regex Pattern using re.compile()
Python’s re.compile() method is used to compile a regular expression pattern provided as a string into a regex pattern object ( re.Pattern ). Later we can use this pattern object to search for a match inside different target strings using regex methods such as a re.match() or re.search() .
In simple terms, We can compile a regular expression into a regex object to look for occurrences of the same pattern inside various target strings without rewriting it.
Table of contents
How to use re.compile() method
Syntax of re.compile()
- pattern : regex pattern in string format, which you are trying to match inside the target string.
- flags : The expression’s behavior can be modified by specifying regex flag values. This is an optional parameter
There are many flags values we can use. For example, the re.I is used for performing case-insensitive matching. We can also combine multiple flags using OR (the | operator).
Return value
The re.compile() method returns a pattern object ( i.e., re.Pattern ).
How to compile regex pattern
- Write regex pattern in string format Write regex pattern using a raw string. For example, a pattern to match any digit.
str_pattern = r’\d’ - Pass a pattern to the compile() method pattern = re.compile(r’\d)
It compiles a regular expression pattern provided as a string into a regex pattern object. - Use Pattern object to match a regex pattern Use Pattern object returned by the compile() method to match a regex pattern.
res = pattern.findall(target_string)
Example to compile a regular expression
Now, let’s see how to use the re.compile() with the help of a simple example.
What does this pattern mean?
- First of all, I used a raw string to specify the regular expression pattern.
- Next, \d is a special sequence and it will match any digit from 0 to 9 in a target string.
- Then the 3 inside curly braces mean the digit has to occur exactly three times in a row inside the target string.
In simple words, it means to match any three consecutive digits inside the target string such as 236 or 452, or 782.
import re # Target String one str1 = "Emma's luck numbers are 251 761 231 451" # pattern to find three consecutive digits string_pattern = r"\d" # compile string pattern to re.Pattern object regex_pattern = re.compile(string_pattern) # print the type of compiled pattern print(type(regex_pattern)) # Output # find all the matches in string one result = regex_pattern.findall(str1) print(result) # Output ['251', '761', '231', '451'] # Target String two str2 = "Kelly's luck numbers are 111 212 415" # find all the matches in second string by reusing the same pattern result = regex_pattern.findall(str2) print(result) # Output ['111', '212', '415']
As you can see, we found four matches of “three consecutive” digits inside the first string.
- The re.compile() method changed the string pattern into a re.Pattern object that we can work upon.
- Next, we used the re.Pattern object inside a re.findall() method to obtain all the possible matches of any three consecutive digits inside the target string.
- Now, the same reagex_pattern object can be used similarly for searching for three consecutive digits in other target strings as well.
Why and when to use re.compile()
Performance improvement
Compiling regular expression objects is useful and efficient when the expression will be used several times in a single program.
Keep in mind that the compile() method is useful for defining and creating regular expressions object initially and then using that object we can look for occurrences of the same pattern inside various target strings without rewriting it which saves time and improves performance.
Readability
Another benefit is readability. Using re.compile() you can separate the definition of the regex from its use.
pattern= re.compile("str_pattern") result = pattern.match(string)
result = re.match("str_pattern", string)
Avoid using the compile() method when you want to search for various patterns inside the single target string. You do not need to use the compile method beforehand because the compiling is done automatically with the execution of other regex methods.
Is it worth using Python’s re.compile() ?
As you know, Python always internally compiles and caches regexes whenever you use them anyway (including calls to search() or match()), so using compile() method, you’re only changing when the regex gets compiled.
But compiling regex is useful for the following situations.
- It denotes that the compiled regular expressions will be used a lot and is not meant to be removed.
- By compiling once and re-using the same regex multiple times, we reduce the possibility of typos.
- When you are using lots of different regexes, you should keep your compiled expressions for those which are used multiple times, so they’re not flushed out of the regex cache when the cache is full.
Also, please check the official documentation which says, The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.
So, in conclusion, Yes, you should use the compile() method when you’re going to perform a lot of matches using the same pattern. Also, when you are searching for the same pattern over and over again and in multiple target strings