Regex Too Slow Need To Optimize

Question

I am using a regex expression like "a.{1000000}b.{1000000}c" to pattern match on a string. However this is WAY too slow. Is there a better way to do this? I am not interested in the stuff between a, b and c, as long as their gap is of my specified size I care not of the content within. One can think of it as skipping n characters. Checking the index doesn't serve me well either, I need to be using some built-in method written in C. Any suggestions?

Thanks in advance

Answer 1

If you just need to verify that a string is in a given pattern and do not care to extract the a, b, nor c then this would work:

(?=^a.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}b.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}c$)

The limit for regex quantifiers is 65535 so if you need one million then you would have to repeat .{50000} 20 times like I did above.

Now you just need to make Python code that says "if regex match then proceed"

Regex101 takes 68ms so I would consider that to be "fast".

https://regex101.com/r/q6RgNJ/1

Regex Too Slow Need To Optimize

Question

1 answers

solution1
1 ACCPTED 2020-11-12 14:28:13

Regex Too Slow Need To Optimize

Question

1 answers

solution1 1 ACCPTED 2020-11-12 14:28:13

solution1
1 ACCPTED 2020-11-12 14:28:13