[英]regular expression to match a digit present n or more times before character and n or less times after it
I need a regex to match a string based on two conditions using Python:我需要一个正则表达式来匹配基于使用 Python 的两个条件的字符串:
,
,
,
,
Note: there is only one comma.注意:只有一个逗号。
For example:例如:
111,222
with n = 3 and m = 0 should return true because 1 is present 3 or more times before ,
and 0 times after ,
111,222
n = 3和m = 0应该返回真,因为 1 在 之前出现3 次或更多次,
之后出现0 次,
111,212
with n = 3 and m = 0 should return false because despite 1 is present 3 or more times before ,
it is present more than 0 times after ,
111,212
与n = 3和m = 0应该返回 false 因为尽管 1 之前出现了3 次或更多次,
但它出现了 0次以上,
111,212
with n = 3 and m = 1 should return true because 1 is present 3 or more times before ,
and only 1 time after ,
n = 3和m = 1 的
111,212
应该返回真,因为 1 在 之前出现3 次或更多次,
之后仅出现 1 次,
I use (\\d+)\\1{n,}
to capture the digit and check the first condition.我使用
(\\d+)\\1{n,}
来捕获数字并检查第一个条件。 But I am having trouble with the second condition.但是我在第二个条件下遇到了麻烦。 I tried
(\\d+)\\1{n,},\\d*((?!\\1)){0,m}\\d*
but it is not working.我试过
(\\d+)\\1{n,},\\d*((?!\\1)){0,m}\\d*
但它不起作用。
I assume that the \\d
after the ,
in the regular expression matches the capturing group that should not appear, any idea?我假设正则表达式中
,
之后的\\d
匹配不应出现的捕获组,知道吗?
You're better to do this in code without regex by splitting on ,
and then counting the number of occurrences that one digit has in both parts.您最好在没有正则表达式的代码中执行此操作,方法是拆分 on
,
然后计算一个数字在两个部分中出现的次数。 In python, it would be something like this:在python中,它会是这样的:
See code in use here - change the values of n
and m
请参阅此处使用的代码- 更改
n
和m
的值
ss = ['111,222','111,212']
n,m = 3,1
for s in ss:
x,y = s.split(',')
for c in x:
if (x.count(c) >= n) and (y.count(c) <= m):
print(s)
break
In regex, it can be accomplished with something like the following but it's really not ideal:在正则表达式中,它可以通过以下方式完成,但实际上并不理想:
See regex in use here请参阅此处使用的正则表达式
(\d)(?:(?:(?!\1)\d)*\1){2}\d*,(?:(?!\1)\d)*(?:\1(?:(?!\1)\d)*){0,1}$
# ^ n-1 ^ m
Since you only care that it meets the minimum requirement of n
, we don't need to do {2,}
由于您只关心它满足
n
的最低要求,因此我们不需要执行{2,}
In this part of the pattern (\\d+)\\1{n,}
if n=3 you will repeat what you already have captured 3 times so it will try to match 4 digits instead of 3 digits.在模式的这一部分
(\\d+)\\1{n,}
如果 n=3,您将重复您已经捕获的内容 3 次,因此它将尝试匹配 4 位数字而不是 3 位数字。
I would suggest not using {0,m}
, but match an exact times like {1}
or {2}
etc, and after you have matched the backreference to group 1, assert that no more occurrences follow using the negative lookahead.我建议不要使用
{0,m}
,而是匹配像{1}
或{2}
等的精确时间,并且在您将反向引用与组 1 匹配后,使用负前瞻断言不再出现。
^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*
^
Start of string ^
字符串开始(\\d)\\1{2,},
Capture group 1 , match a digit and repeat the backreference 2 or more times (\\d)\\1{2,},
捕获组 1 ,匹配一个数字并重复反向引用 2 次或更多次(?=
Positive lookahead (?=
正向预测
(
Capture group 2 (
捕获组 2
(?:\\d*?\\1){1}
Repeat matching the backreference to group 1 m times. (?:\\d*?\\1){1}
重复将反向引用匹配到组 1 m 次。 Here m = 1
m = 1
)
Close group )
关闭群组)
Close lookahead )
关闭前瞻\\2
Match what is captured in group 2 to prevent backtracking \\2
匹配第 2 组中捕获的内容以防止回溯(?!\\d*\\1)
Negative lookahead, assert what follows is no more occurrences of group 1 (?!\\d*\\1)
负前瞻,断言接下来是第 1 组不再出现\\d*
Match 0+ digits \\d*
匹配 0+ 个数字Regex demo |正则表达式演示| Python demo
Python 演示
For example例如
import re
regex = r"^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*"
test_str = ("111,222\n"
"111,212")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print (match.group())
Output输出
111,212
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.