简体   繁体   English

[Python]:用于长字符串行的Python re.search速度优化

[英][Python]: Python re.search speed optimization for long string lines

I will just ask on how to speed-up re.search on python. 我将问一下如何加速对python的重新研究。
I have a long string line, which is 176861 of length (ie alphanumeric characters with some symbols) and I tested this line for an re.search using this function: 我有一个很长的字符串行,长度为176861(即带有一些符号的字母数字字符),我使用此函数测试了此行以进行re.search:

def getExecTime():
   start_time = time.time()
   re.search(r'.*^string .*=.*', temp)
   stop_time = time.time() - start_time
   print "Execution time is : %s seconds" % stop_time

Average result of this is ~414 seconds (around 6 to 7 minutes). 平均结果是~414秒(大约6到7分钟)。 Is there anyway I can reduce this to let say, around ~2 minutes or less? 无论如何,我可以减少这个,大约在2分钟左右? Based on other's people feedback here, splitting this long line to list of strings will not produce any significant impact in terms of execution time. 根据此处其他人的反馈,将此长行拆分为字符串列表不会对执行时间产生任何重大影响。 Any ideas are greatly appreciated. 任何想法都非常感谢。 Thanks in advance! 提前致谢!

re.search already goes character by character, starting your pattern with .* will just mean that it will always match and every character of the large string can be a candidate... you need to improve your regular expression, or use re.match instead of re.search . re.searchre.search ,用.*开始你的模式只会意味着它总是匹配,大字符串的每个字符都可以成为候选者......你需要改进你的正则表达式,或者使用re.match而不是re.search

Also - You are using ^ in the wrong place I believe, it can either signify the start of a newline, (in which case you need to pass the multiline flag re.MULTILINE to the compiler/regex) Or it means "not" when used in character set. 另外 - 你在错误的地方使用^我相信,它可以表示换行符的开始,(在这种情况下你需要将多行标志re.MULTILINE传递给编译器/正则表达式)或者它意味着“不是”当用于字符集。

You should change your regex to something like this: 您应该将正则表达式更改为以下内容:

r'string [^=]*=.*'

This says, look for the word "string" followed by a space, then any number of characters that are not = then = then anything. 这样说,寻找单词“string”后跟一个空格,然后任意数量的字符= then = then。 Also - You might want to use + instead of * because * can also mean 0 matches, where + requires at least 1 character. 此外 - 您可能希望使用+而不是*因为*也可以表示0匹配,其中+至少需要1个字符。

But without any more information on your end - it will be hard to tell what exactly is needed. 但是如果没有更多关于你的信息 - 很难说出究竟需要什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM