[英]Regex: How to capture a sequence of 6-12 digits that may be separated by spaces without capturing any trailing space
I am attempting capture a sequence of 6-12 digits that may be separated by spaces like the ones below (The letter D at the end is just an example. It's possible that there is nothing at the end of the string, or there is some kind of punctuation or letter).我正在尝试捕获可能由空格分隔的 6-12 位数字序列,如下所示(末尾的字母 D 只是一个示例。字符串末尾可能没有任何内容,或者有一些一种标点符号或字母)。
123 345 4567 89 D
123 345456789 D
My current attempts are as follows:我目前的尝试如下:
Attempt 1 : with the lazy quantifier *?
尝试 1 :使用惰性量词*?
: :
"\b(?:\d *?){6,12}\b"
With this, it will successfully return all the digits in this string, 123 345456789 D
, but fails to fully capture the digits in 123 345 4567 89 D
(only the first two groups are captured) -- This I assume is because the first two groups of digits (ie, 123 345
) fulfill the minimum requirement of 6 digits due to the lazy quantifier, so the regex stops once the minimum requirement is fulfilled.这样,它将成功返回此字符串中的所有数字123 345456789 D
,但无法完全捕获123 345 4567 89 D
中的数字(仅捕获前两组)——我认为这是因为前两个由于惰性量词,数字组(即123 345
)满足 6 位数字的最低要求,因此一旦满足最低要求,正则表达式就会停止。
Attempt 2 : without using the lazy quantifier (just using *
):尝试 2 :不使用惰性量词(仅使用*
):
"\b(?:\d *){6,12}\b"
With this, all the groups of digits in the examples above are captured.有了这个,上面例子中的所有数字组都被捕获了。 However, this regex will also capture the trailing space that is right between the last digit and the letter D.但是,此正则表达式还将捕获最后一位数字和字母 D 之间的尾随空格。
So I wonder if there is a way to capture all the digits without including the trailing space.所以我想知道是否有一种方法可以在不包括尾随空格的情况下捕获所有数字。 I am doing this in Python, so one thought was to use the second regex but strip away any trailing space after a match is returned, but it seems really inelegant.我在 Python 中这样做,所以有人认为是使用第二个正则表达式,但在返回匹配项后去除任何尾随空格,但这看起来真的很不雅观。
This will do it: ((?:\d\s*){5,11}\d?)
这将做到: ((?:\d\s*){5,11}\d?)
See: https://regex101.com/r/qcRbip/1参见: https://regex101.com/r/qcRbip/1
The quantifier in (?:\d *)
is greedy, and will match a space if it is there, also matching it at the end. (?:\d *)
中的量词是贪心的,如果有空格就会匹配,最后也会匹配。
In this part (?:\d *?)
the quantifier for matching the space is non greedy so after the minimum requirement of 6 times there is a match.在这部分(?:\d *?)
中,用于匹配空间的量词是非贪婪的,因此在满足最低要求 6 次之后就有了匹配。
\b\d(?: *\d){5,11}\b
\b
A word boundary \b
单词边界(?: *\d){5,11}
Repeat 5 - 11 times and optional spaces and a digit (?: *\d){5,11}
重复 5 - 11 次和可选的空格和一个数字\b
A word boundary \b
单词边界I cannot reproduce your problem.我无法重现您的问题。 Your attempt 2 works find for me.你的尝试 2 为我找到了作品。 Here is my code:这是我的代码:
s = "123 345 4567 89 D"
re.findall("(?:\d *?){6,12}", s)
['123 345', '4567 89']
d = "123 345456789 D"
re.findall("(?:\d *?){6,12}", d)
['123 345456789']
"\b\d(?: *\d){5,11}\b"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.