简体   繁体   中英

[Python]: Python re.search speed optimization for long string lines

I will just ask on how to speed-up re.search on python.
I have a long string line, which is 176861 of length (ie alphanumeric characters with some symbols) and I tested this line for an re.search using this function:

def getExecTime():
   start_time = time.time()
   re.search(r'.*^string .*=.*', temp)
   stop_time = time.time() - start_time
   print "Execution time is : %s seconds" % stop_time

Average result of this is ~414 seconds (around 6 to 7 minutes). Is there anyway I can reduce this to let say, around ~2 minutes or less? Based on other's people feedback here, splitting this long line to list of strings will not produce any significant impact in terms of execution time. Any ideas are greatly appreciated. Thanks in advance!

re.search already goes character by character, starting your pattern with .* will just mean that it will always match and every character of the large string can be a candidate... you need to improve your regular expression, or use re.match instead of re.search .

Also - You are using ^ in the wrong place I believe, it can either signify the start of a newline, (in which case you need to pass the multiline flag re.MULTILINE to the compiler/regex) Or it means "not" when used in character set.

You should change your regex to something like this:

r'string [^=]*=.*'

This says, look for the word "string" followed by a space, then any number of characters that are not = then = then anything. Also - You might want to use + instead of * because * can also mean 0 matches, where + requires at least 1 character.

But without any more information on your end - it will be hard to tell what exactly is needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM