正则表达式匹配在字符之前出现 n 次或更多次并且在它之后出现 n 次或更短时间的数字

Question

I need a regex to match a string based on two conditions using Python:我需要一个正则表达式来匹配基于使用 Python 的两个条件的字符串：

a digit is present at least n times before a ,一个数字在a之前至少出现n次,
The digit matched from condition 1 is present at most m times after a ,与条件 1 匹配的数字在 a 后最多出现m次,

Note: there is only one comma.注意：只有一个逗号。

For example:例如：

111,222 with n = 3 and m = 0 should return true because 1 is present 3 or more times before , and 0 times after , 111,222 n = 3和m = 0应该返回真，因为 1 在之前出现3 次或更多次,之后出现0 次,

111,212 with n = 3 and m = 0 should return false because despite 1 is present 3 or more times before , it is present more than 0 times after , 111,212与n = 3和m = 0应该返回 false 因为尽管 1 之前出现了3 次或更多次,但它出现了 0次以上,

111,212 with n = 3 and m = 1 should return true because 1 is present 3 or more times before , and only 1 time after , n = 3和m = 1 的111,212应该返回真，因为 1 在之前出现3 次或更多次,之后仅出现 1 次,

I use (\\d+)\\1{n,} to capture the digit and check the first condition.我使用(\\d+)\\1{n,}来捕获数字并检查第一个条件。 But I am having trouble with the second condition.但是我在第二个条件下遇到了麻烦。 I tried (\\d+)\\1{n,},\\d*((?!\\1)){0,m}\\d* but it is not working.我试过(\\d+)\\1{n,},\\d*((?!\\1)){0,m}\\d*但它不起作用。

I assume that the \\d after the , in the regular expression matches the capturing group that should not appear, any idea?我假设正则表达式中,之后的\\d匹配不应出现的捕获组，知道吗？

Answer 1

Code代码

You're better to do this in code without regex by splitting on , and then counting the number of occurrences that one digit has in both parts.您最好在没有正则表达式的代码中执行此操作，方法是拆分 on ,然后计算一个数字在两个部分中出现的次数。 In python, it would be something like this:在python中，它会是这样的：

See code in use here - change the values of n and m 请参阅此处使用的代码- 更改n和m的值

ss = ['111,222','111,212']
n,m = 3,1
for s in ss:
    x,y = s.split(',')
    for c in x:
        if (x.count(c) >= n) and (y.count(c) <= m):
            print(s)
            break

Regex正则表达式

In regex, it can be accomplished with something like the following but it's really not ideal:在正则表达式中，它可以通过以下方式完成，但实际上并不理想：

See regex in use here请参阅此处使用的正则表达式

(\d)(?:(?:(?!\1)\d)*\1){2}\d*,(?:(?!\1)\d)*(?:\1(?:(?!\1)\d)*){0,1}$
#                       ^ n-1                                    ^ m

Since you only care that it meets the minimum requirement of n , we don't need to do {2,}由于您只关心它满足n的最低要求，因此我们不需要执行{2,}

Answer 2

In this part of the pattern (\\d+)\\1{n,} if n=3 you will repeat what you already have captured 3 times so it will try to match 4 digits instead of 3 digits.在模式的这一部分(\\d+)\\1{n,}如果 n=3，您将重复您已经捕获的内容 3 次，因此它将尝试匹配 4 位数字而不是 3 位数字。

I would suggest not using {0,m} , but match an exact times like {1} or {2} etc, and after you have matched the backreference to group 1, assert that no more occurrences follow using the negative lookahead.我建议不要使用{0,m} ，而是匹配像{1}或{2}等的精确时间，并且在您将反向引用与组 1 匹配后，使用负前瞻断言不再出现。

^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*

^ Start of string ^字符串开始
(\\d)\\1{2,}, Capture group 1 , match a digit and repeat the backreference 2 or more times (\\d)\\1{2,},捕获组 1 ，匹配一个数字并重复反向引用 2 次或更多次
(?= Positive lookahead (?=正向预测
- ( Capture group 2 (捕获组 2
  - (?:\\d*?\\1){1} Repeat matching the backreference to group 1 m times. (?:\\d*?\\1){1}重复将反向引用匹配到组 1 m 次。 Here m = 1这里m = 1
- ) Close group )关闭群组
) Close lookahead )关闭前瞻
\\2 Match what is captured in group 2 to prevent backtracking \\2匹配第 2 组中捕获的内容以防止回溯
(?!\\d*\\1) Negative lookahead, assert what follows is no more occurrences of group 1 (?!\\d*\\1)负前瞻，断言接下来是第 1 组不再出现
\\d* Match 0+ digits \\d*匹配 0+ 个数字

Regex demo |正则表达式演示| Python demo Python 演示

For example例如

import re

regex = r"^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*"
test_str = ("111,222\n"
    "111,212")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    print (match.group())

Output输出

111,212

正则表达式匹配在字符之前出现 n 次或更多次并且在它之后出现 n 次或更短时间的数字

问题描述

2 个解决方案

解决方案1
3 2020-01-09 23:09:41

Code代码

Regex正则表达式

解决方案2
1 已采纳 2020-01-09 23:00:37

正则表达式匹配在字符之前出现 n 次或更多次并且在它之后出现 n 次或更短时间的数字

问题描述

2 个解决方案

解决方案1 3 2020-01-09 23:09:41

Code代码

Regex正则表达式

解决方案2 1 已采纳 2020-01-09 23:00:37

解决方案1
3 2020-01-09 23:09:41

解决方案2
1 已采纳 2020-01-09 23:00:37