简体   繁体   English

用于在Python中查找模式的正则表达式

[英]Regular expressions for finding patterns in Python

I am having a trouble with adjusting regular expression formula to do what I want. 我在调整正则表达式公式以执行所需操作时遇到麻烦。

I want to find a pattern: 我想找到一个模式:

AxxHxxxAxxHxxxbbbbbAxxHxxxAxxHxxx

But not: 但不是:

AxxHxxxAxxHxxxAxxHxxxAxxHxxxAxxHxxxAxxHxxxAxxHxxxAxxHxxx

If I use: 如果我使用:

"(A\w{2}H\w{3}){2,4}.+(A\w{2}H\w{3}){2,4}"

It will find both. 它将找到两者。 I've tried excluding long stretches of AxxHxxx repeat by: 我尝试通过以下方式排除长时间的AxxHxxx重复:

"(?!(A\w{2}H\w{3}){8})(A\w{2}H\w{3}){2,4}.+(A\w{2}H\w{3}){2,4}"

But it's actually not working. 但这实际上不起作用。 Do any of you have any ideas how to solve this problem? 你们有任何想法如何解决这个问题吗? Since I will be operating rather on the big set of data, I would like to preferably avoid loop hell with splicing strings and so on. 由于我将要对大量数据进行操作,因此,我最好避免使用拼接字符串等来避免循环死机。

Thanks in advance! 提前致谢!

Edit: 编辑:

As there was an interest in another set of examples I will try to explain in more details what I am trying to accomplish. 由于对另一组示例感兴趣,因此我将尝试更详细地说明我要完成的工作。

I want to find in a single string a fragment which will consist of two repetition and a set of any characters in between, such as: 我想在一个字符串中找到一个片段,该片段将包含两个重复和介于两者之间的任何字符集,例如:

.......A..H...A..H...A..H...............A..H...A..H...A..H.......................

where "." 其中“。” is any character and you have two repetetive modules of "A..H...". 是任何字符,并且有两个重复的模块“ A..H ...”。 The only problem is that I do not want to find continues stretch of repetition such as here: 唯一的问题是我不想找到像这样的持续重复:

.......A..H...A..H...A..H...A..H...A..H...A..H......................

This seems to produce the correct result with your test cases, but it is very ugly. 这似乎可以在您的测试用例中产生正确的结果,但是非常丑陋。

/^(?:A.{2}H.{3})+(?:(?!A.{2}H.{3}).)+(?:A.{2}H.{3})+$/

http://regex101.com/r/sW5vG2 http://regex101.com/r/sW5vG2

Is this what you want? 这是你想要的吗?

The efficiency will be directly proportional to the length of the string. 效率将与弦的长度成正比。

Edit: Improved solution using PCRE http://regex101.com/r/wX8uK2 编辑:使用PCRE的改进解决方案http://regex101.com/r/wX8uK2

I don't know how much I understood your problem. 我不知道我对您的问题有多了解。

For solving your specific problem , You can use something like this , I think so. 为了解决您的特定问题,我可以这样使用。

^(.*)\1((?!\1).)+\1{2}$

Check DEMO 检查演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM