简体   繁体   English

正则表达式匹配在字符之前出现 n 次或更多次并且在它之后出现 n 次或更短时间的数字

[英]regular expression to match a digit present n or more times before character and n or less times after it

I need a regex to match a string based on two conditions using Python:我需要一个正则表达式来匹配基于使用 Python 的两个条件的字符串:

  1. a digit is present at least n times before a ,一个数字a之前至少出现n,
  2. The digit matched from condition 1 is present at most m times after a ,与条件 1 匹配的数字在 a 后最多出现m,

Note: there is only one comma.注意:只有一个逗号。

For example:例如:

111,222 with n = 3 and m = 0 should return true because 1 is present 3 or more times before , and 0 times after , 111,222 n = 3m = 0应该返回真,因为 1 在 之前出现3 次或更多次,之后出现0 次,

111,212 with n = 3 and m = 0 should return false because despite 1 is present 3 or more times before , it is present more than 0 times after , 111,212n = 3m = 0应该返回 false 因为尽管 1 之前出现了3 次或更多次,但它出现了 0次以上,

111,212 with n = 3 and m = 1 should return true because 1 is present 3 or more times before , and only 1 time after , n = 3m = 1 的111,212应该返回真,因为 1 在 之前出现3 次或更多次,之后仅出现 1 次,

I use (\\d+)\\1{n,} to capture the digit and check the first condition.我使用(\\d+)\\1{n,}来捕获数字并检查第一个条件。 But I am having trouble with the second condition.但是我在第二个条件下遇到了麻烦。 I tried (\\d+)\\1{n,},\\d*((?!\\1)){0,m}\\d* but it is not working.我试过(\\d+)\\1{n,},\\d*((?!\\1)){0,m}\\d*但它不起作用。

I assume that the \\d after the , in the regular expression matches the capturing group that should not appear, any idea?我假设正则表达式中,之后的\\d匹配不应出现的捕获组,知道吗?

Code代码

You're better to do this in code without regex by splitting on , and then counting the number of occurrences that one digit has in both parts.您最好在没有正则表达式的代码中执行此操作,方法是拆分 on ,然后计算一个数字在两个部分中出现的次数。 In python, it would be something like this:在python中,它会是这样的:

See code in use here - change the values of n and m 请参阅此处使用的代码- 更改nm的值

ss = ['111,222','111,212']
n,m = 3,1
for s in ss:
    x,y = s.split(',')
    for c in x:
        if (x.count(c) >= n) and (y.count(c) <= m):
            print(s)
            break

Regex正则表达式

In regex, it can be accomplished with something like the following but it's really not ideal:在正则表达式中,它可以通过以下方式完成,但实际上并不理想:

See regex in use here请参阅此处使用的正则表达式

(\d)(?:(?:(?!\1)\d)*\1){2}\d*,(?:(?!\1)\d)*(?:\1(?:(?!\1)\d)*){0,1}$
#                       ^ n-1                                    ^ m

Since you only care that it meets the minimum requirement of n , we don't need to do {2,}由于您只关心它满足n的最低要求,因此我们不需要执行{2,}

In this part of the pattern (\\d+)\\1{n,} if n=3 you will repeat what you already have captured 3 times so it will try to match 4 digits instead of 3 digits.在模式的这一部分(\\d+)\\1{n,}如果 n=3,您将重复您已经捕获的内容 3 次,因此它将尝试匹配 4 位数字而不是 3 位数字。

I would suggest not using {0,m} , but match an exact times like {1} or {2} etc, and after you have matched the backreference to group 1, assert that no more occurrences follow using the negative lookahead.我建议不要使用{0,m} ,而是匹配像{1}{2}等的精确时间,并且在您将反向引用与组 1 匹配后,使用负前瞻断言不再出现。

^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*
  • ^ Start of string ^字符串开始
  • (\\d)\\1{2,}, Capture group 1 , match a digit and repeat the backreference 2 or more times (\\d)\\1{2,},捕获组 1 ,匹配一个数字并重复反向引用 2 次或更多次
  • (?= Positive lookahead (?=正向预测
    • ( Capture group 2 (捕获组 2
      • (?:\\d*?\\1){1} Repeat matching the backreference to group 1 m times. (?:\\d*?\\1){1}重复将反向引用匹配到组 1 m 次。 Here m = 1这里m = 1
    • ) Close group )关闭群组
  • ) Close lookahead )关闭前瞻
  • \\2 Match what is captured in group 2 to prevent backtracking \\2匹配第 2 组中捕获的内容以防止回溯
  • (?!\\d*\\1) Negative lookahead, assert what follows is no more occurrences of group 1 (?!\\d*\\1)负前瞻,断言接下来是第 1 组不再出现
  • \\d* Match 0+ digits \\d*匹配 0+ 个数字

Regex demo |正则表达式演示| Python demo Python 演示

For example例如

import re

regex = r"^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*"
test_str = ("111,222\n"
    "111,212")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    print (match.group())

Output输出

111,212

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM