简体   繁体   中英

Compiling a regex inside a function that's called multiple times

If you compile a regex inside a function, and that function gets called multiple times, does Python recompile the regex each time, or does Python cache the compiled regex (assuming the regex doesn't change)?

For example:

def contains_text_of_interest(line):
    r = re.compile(r"foo\dbar\d")  
    return r.match(line)

def parse_file(fname):
    for line in open(fname):
        if contains_text_of_interest(line):
           # Do something interesting

Actually, if you look at the code in the re module, the re.compile function uses the cache just as all the other functions do, so compiling the same regex over and over again is very very cheap (a dictionary lookup). In other words, write the code to be the most understandable or maintainable or expressive, and don't worry about the overhead of compiling regexes.

If you want to avoid the overhead of calling re.compile() every time, you can do:

def contains_text_of_interest(line, r = re.compile(r"foo\dbar\d")): 
    return r.match(line) 

Why don't you just put the re.compile outside functions (at module or class level), give it an explicit name and just use it ? That kind of regex is a kind of constant and you can treat it the same way.

MATCH_FOO_BAR = re.compile(r"foo\dbar\d")  

def contains_text_of_interest(line):
    return MATCH_FOO_BAR.match(line)

Dingo's solution is a good one [edit: Ned Batchelder's explanation is even better], but here's another one which I think is neat: use closures! If that sounds like a "big word" to you, don't worry. The concept is simple:

def make_matching_function():
    matcher = re.compile(r"foo\dbar\d")
    def f(line):
        return matcher.match(line)
    return f
contains_text_of_interest = make_matching_function()

make_matching_function is called only once, and therefore the regex is compiled only once. The function f , which is assigned to contains_text_of_interest , knows about the compiled regex matcher because it's in the surrounding scope, and will always know about it, even if you use contains_text_of_interest somewhere else (that's closures: code that takes the surrounding scope with it).

Not the most Pythonic solution to this problem, surely. But it's a good idiom to have up your sleeve, for when the time is right :)

It does the "wrong" thing, here's a longer thread on the topic.

I'm using Python regexes in a criminally inefficient manner

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM