简体   繁体   中英

Regex Too Slow Need To Optimize

I am using a regex expression like "a.{1000000}b.{1000000}c" to pattern match on a string. However this is WAY too slow. Is there a better way to do this? I am not interested in the stuff between a, b and c, as long as their gap is of my specified size I care not of the content within. One can think of it as skipping n characters. Checking the index doesn't serve me well either, I need to be using some built-in method written in C. Any suggestions?

Thanks in advance

If you just need to verify that a string is in a given pattern and do not care to extract the a, b, nor c then this would work:

(?=^a.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}b.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}c$)

The limit for regex quantifiers is 65535 so if you need one million then you would have to repeat .{50000} 20 times like I did above.

Now you just need to make Python code that says "if regex match then proceed"

Regex101 takes 68ms so I would consider that to be "fast".

https://regex101.com/r/q6RgNJ/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM