简体   繁体   中英

Minimum lenght of string that match a regex

I would like to know if a RegEX engine, before to try to match a regex, checks if the data has a minimum length that the regexp requires. For example the regex "a{1000}" in a data composed of 999 "a", fails. The result can be obtained avoiding to apply the regex, and only performing some checks to the length of the data (and the minimum of the regex). So, generically, a RegEX engine performs this kind of tests? In particular I'm interested to know if the re module of Python does this.

In particular I'm interested to know if the re module of Python does this.

A measurement suggests that it does.

import re
import timeit
def test(charsInString, charsInRegex):
    regex = re.compile('a{'+str(charsInRegex)+'}')
    string = 'a'*charsInString;
    for i in range(1, 200000):
        regex.match(string)
print(timeit.timeit("test(1, 1)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(1, 2)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(1, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(4999, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(5000, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(50000, 5000)", setup="from __main__ import test", number=1))

Output:

0.9117504503834146
0.8135033788142646
0.819454105947109
0.8154557798237785
15.441637204298287
15.412751909222905

And a more complex one:

import re
import timeit
def test2(charsInString):
    regex = re.compile('((ab{3,5}c+){5000,6000}d)+e*f')
    string = 'abbbbcc'*charsInString;
    for i in range(1, 100000):
        regex.match(string)
print(timeit.timeit("test2(1)", setup="from __main__ import test2", number=1))
print(timeit.timeit("test2(3571)", setup="from __main__ import test2", number=1))
print(timeit.timeit("test2(3572)", setup="from __main__ import test2", number=1))

Output:

0.04918821760123643
0.04305112491748375
60.76094317352544

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM