简体   繁体   English

正则表达式太慢需要优化

[英]Regex Too Slow Need To Optimize

I am using a regex expression like "a.{1000000}b.{1000000}c" to pattern match on a string.我正在使用像“a.{1000000}b.{1000000}c”这样的正则表达式来对字符串进行模式匹配。 However this is WAY too slow.然而,这太慢了。 Is there a better way to do this?有一个更好的方法吗? I am not interested in the stuff between a, b and c, as long as their gap is of my specified size I care not of the content within.我对 a、b 和 c 之间的东西不感兴趣,只要它们的间隙达到我指定的大小我就不关心其中的内容。 One can think of it as skipping n characters.可以将其视为跳过 n 个字符。 Checking the index doesn't serve me well either, I need to be using some built-in method written in C. Any suggestions?检查索引也不适合我,我需要使用 C 中编写的一些内置方法。有什么建议吗?

Thanks in advance提前致谢

If you just need to verify that a string is in a given pattern and do not care to extract the a, b, nor c then this would work:如果您只需要验证字符串是否在给定模式中并且不关心提取 a、b 或 c,那么这将起作用:

(?=^a.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}b.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}.{50000}c$)

The limit for regex quantifiers is 65535 so if you need one million then you would have to repeat .{50000} 20 times like I did above.正则表达式量词的限制是65535 ,所以如果你需要一百万,那么你必须像我上面那样重复.{50000} 20 次。

Now you just need to make Python code that says "if regex match then proceed"现在你只需要制作 Python 代码,上面写着“如果正则表达式匹配则继续”

Regex101 takes 68ms so I would consider that to be "fast". Regex101 需要 68 毫秒,所以我认为这是“快”的。

https://regex101.com/r/q6RgNJ/1 https://regex101.com/r/q6RgNJ/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM