speed up re.match() on a list of string?

Question

Suppose s is a long list of strings. I'd like to extract the indexes of the elements in the list that matches the regex. But when the list is very long, the runtime can be slow. Is there a way to speed up the search?

regex = re.compile('^x.*$')
result = [i for i,v in enumerate(s) if regex.match(v)]

Answer 1

If all what you want to do is check if the string begins with an "x", you can use startswith :

result = [i for i, v in enumerate(s) if v.startswith("x")]

$ python -m timeit -n 1000 -s 'import re; regex = re.compile("^x.*$");' '[i for i,v in enumerate(["xax", "y", "xaff"]) if regex.match(v)]'
1000 loops, best of 3: 1.62 usec per loop
$ python -m  timeit -n 1000 '[i for i, v in enumerate(["xax", "y", "xaff"]) if v.startswith("x")]'
1000 loops, best of 3: 1.17 usec per loop

Answer 2

Split the list into chunks and use python multiprocessing or multithreading. Find the index of matches for each chunk, and add the index of the beginning of each chunk to your matches so that the final indexes match the overall index in the list.

speed up re.match() on a list of string?

Question

2 answers

solution1
0 2018-02-04 06:31:11

solution2
0 2018-02-04 06:36:23

speed up re.match() on a list of string?

Question

2 answers

solution1 0 2018-02-04 06:31:11

solution2 0 2018-02-04 06:36:23

solution1
0 2018-02-04 06:31:11

solution2
0 2018-02-04 06:36:23