简体   繁体   中英

speed up re.match() on a list of string?

Suppose s is a long list of strings. I'd like to extract the indexes of the elements in the list that matches the regex. But when the list is very long, the runtime can be slow. Is there a way to speed up the search?

regex = re.compile('^x.*$')
result = [i for i,v in enumerate(s) if regex.match(v)]

If all what you want to do is check if the string begins with an "x", you can use startswith :

result = [i for i, v in enumerate(s) if v.startswith("x")]

$ python -m timeit -n 1000 -s 'import re; regex = re.compile("^x.*$");' '[i for i,v in enumerate(["xax", "y", "xaff"]) if regex.match(v)]'
1000 loops, best of 3: 1.62 usec per loop
$ python -m  timeit -n 1000 '[i for i, v in enumerate(["xax", "y", "xaff"]) if v.startswith("x")]'
1000 loops, best of 3: 1.17 usec per loop

Split the list into chunks and use python multiprocessing or multithreading. Find the index of matches for each chunk, and add the index of the beginning of each chunk to your matches so that the final indexes match the overall index in the list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM