简体   繁体   English

与正则表达式匹配的字符串的最小长度

[英]Minimum lenght of string that match a regex

I would like to know if a RegEX engine, before to try to match a regex, checks if the data has a minimum length that the regexp requires. 我想知道在尝试匹配正则表达式之前,RegEX引擎是否检查数据是否具有正则表达式所需的最小长度 For example the regex "a{1000}" in a data composed of 999 "a", fails. 例如,由999个“ a”组成的数据中的正则表达式“ a {1000}”将失败。 The result can be obtained avoiding to apply the regex, and only performing some checks to the length of the data (and the minimum of the regex). 可以避免使用正则表达式,而仅对数据的长度(和正则表达式的最小值)进行一些检查,可以获得结果。 So, generically, a RegEX engine performs this kind of tests? 那么,一般来说,RegEX引擎执行这种测试吗? In particular I'm interested to know if the re module of Python does this. 我特别想知道Python的re模块是否做到了这一点。

In particular I'm interested to know if the re module of Python does this. 我特别想知道Python的re模块是否做到了这一点。

A measurement suggests that it does. 测量表明确实如此。

import re
import timeit
def test(charsInString, charsInRegex):
    regex = re.compile('a{'+str(charsInRegex)+'}')
    string = 'a'*charsInString;
    for i in range(1, 200000):
        regex.match(string)
print(timeit.timeit("test(1, 1)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(1, 2)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(1, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(4999, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(5000, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(50000, 5000)", setup="from __main__ import test", number=1))

Output: 输出:

0.9117504503834146
0.8135033788142646
0.819454105947109
0.8154557798237785
15.441637204298287
15.412751909222905

And a more complex one: 还有一个更复杂的:

import re
import timeit
def test2(charsInString):
    regex = re.compile('((ab{3,5}c+){5000,6000}d)+e*f')
    string = 'abbbbcc'*charsInString;
    for i in range(1, 100000):
        regex.match(string)
print(timeit.timeit("test2(1)", setup="from __main__ import test2", number=1))
print(timeit.timeit("test2(3571)", setup="from __main__ import test2", number=1))
print(timeit.timeit("test2(3572)", setup="from __main__ import test2", number=1))

Output: 输出:

0.04918821760123643
0.04305112491748375
60.76094317352544

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM