简体   繁体   English

正则表达式正好 n 或 m 次

[英]Regex exactly n OR m times

Consider the following regular expression, where X is any regex.考虑以下正则表达式,其中X任何正则表达式。

X{n}|X{m}

This regex would test for X occurring exactly n or m times.此正则表达式将测试X恰好出现n次或m次。

Is there a regex quantifier that can test for an occurrence X exactly n or m times?是否有一个正则表达式量词可以测试X恰好n次或m次出现?

There is no single quantifier that means "exactly m or n times".没有单一的量词表示“恰好 m 次或 n 次”。 The way you are doing it is fine.你这样做的方式很好。

An alternative is:另一种选择是:

X{m}(X{k})?

where m < n and k is the value of nm .其中m < nknm的值。

Here is the complete list of quantifiers (ref. http://www.regular-expressions.info/reference.html ):这是量词的完整列表(参考http://www.regular-expressions.info/reference.html ):

  • ? , ?? , ?? - 0 or 1 occurences ( ?? is lazy, ? is greedy) - 0 或 1 次出现( ??是懒惰的, ?是贪婪的)
  • * , *? **? - any number of occurences - 任意数量的出现
  • + , +? + , +? - at least one occurence - 至少出现一次
  • {n} - exactly n occurences {n} - 恰好n
  • {n,m} - n to m occurences, inclusive {n,m} - nm ,包括
  • {n,m}? - n to m occurences, lazy - nm ,懒惰
  • {n,} , {n,}? {n,} , {n,}? - at least n occurence - 至少出现n

To get "exactly N or M", you need to write the quantified regex twice, unless m,n are special:要获得“恰好 N 或 M”,您需要将量化的正则表达式编写两次,除非 m,n 是特殊的:

  • X{n,m} if m = n+1 X{n,m}如果m = n+1
  • (?:X{n}){1,2} if m = 2n (?:X{n}){1,2}如果m = 2n
  • ... ...

No, there is no such quantifier.不,没有这样的量词。 But I'd restructure it to /X{m}(X{mn})?/ to prevent problems in backtracking .但我会将其重组为/X{m}(X{mn})?/以防止回溯问题

Very old post, but I'd like to contribute sth that might be of help.很老的帖子,但我想贡献一些可能有帮助的东西。 I've tried it exactly the way stated in the question and it does work but there's a catch: The order of the quantities matters.我已经完全按照问题中所述的方式进行了尝试,它确实有效,但有一个问题:数量的顺序很重要。 Consider this:考虑一下:

#[a-f0-9]{6}|#[a-f0-9]{3}

This will find all occurences of hex colour codes (they're either 3 or 6 digits long).这将找到所有出现的十六进制颜色代码(它们的长度为 3 位或 6 位)。 But when I flip it around like this但是当我像这样翻转它时

#[a-f0-9]{3}|#[a-f0-9]{6}

it will only find the 3 digit ones or the first 3 digits of the 6 digit ones.它只会找到 3 位数字或 6 位数字的前 3 位数字。 This does make sense and a Regex pro might spot this right away, but for many this might be a peculiar behaviour.这确实有道理,正则表达式专家可能会立即发现这一点,但对许多人来说,这可能是一种特殊的行为。 There are some advanced Regex features that might avoid this trap regardless of the order, but not everyone is knee-deep into Regex patterns.无论顺序如何,有一些高级 Regex 功能都可以避免此陷阱,但并非每个人都深入了解 Regex 模式。

TLDR; TLDR; (?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)

Looks like you want "xn times" or "xm times", I think a literal translation to regex would be (x{n}|x{m}).看起来您想要“xn 次”或“xm 次”,我认为正则表达式的直译应该是(x{n}|x{m}). Like this https://regex101.com/r/vH7yL5/1像这样https://regex101.com/r/vH7yL5/1

or, in a case where you can have a sequence of more than m "x"s (assuming m > n), you can add 'following no "x"' and 'followed by no "x", translating to [^x](x{n}|x{m})[^x] but that would assume that there are always a character behind and after you "x"s.或者,在您可以拥有多于 m 个“x”的序列(假设 m > n)的情况下,您可以添加 'following no "x"' 和 'following by no "x",转换为[^x](x{n}|x{m})[^x]但这会假设在你“x”的后面和后面总是有一个字符。 As you can see here: https://regex101.com/r/bB2vH2/1正如你在这里看到的: https : //regex101.com/r/bB2vH2/1

you can change it to (?:[^x]|^)(x{n}|x{m})(?:[^x]|$) , translating to "following no 'x' or following line start" and "followed by no 'x' or followed by line end".您可以将其更改为(?:[^x]|^)(x{n}|x{m})(?:[^x]|$) ,转换为“不遵循 'x' 或以下行开始”和“后跟没有'x'或后跟行尾”。 But still, it won't match two sequences with only one character between them (because the first match would require a character after, and the second a character before) as you can see here: https://regex101.com/r/oC5oJ4/1但是,它仍然不会匹配只有一个字符的两个序列(因为第一个匹配需要一个字符,第二个需要一个字符),如下所示: https : //regex101.com/r/ oC5oJ4/1

Finally, to match the one character distant match, you can add a positive look ahead (?=) on the "no 'x' after" or a positive look behind (?<=) on the "no 'x' before", like this: https://regex101.com/r/mC4uX3/1最后,要匹配一个字符的远距离匹配,您可以在“no 'x' after”上添加一个积极的前瞻 (?=) 或在“no 'x' before”上添加一个积极的后视 (?<=),像这样: https : //regex101.com/r/mC4uX3/1

(?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)

This way you will match only the exact number of 'x's you want.这样,您将只匹配您想要的确切数量的“x”。

Taking a look at Enhardened's answer, they state that their penultimate expression won't match sequences with only one character between them.看看 Enhardened 的回答,他们说他们的倒数第二个表达式不会匹配它们之间只有一个字符的序列。 There is an easy way to fix this without using look ahead/look behind, and that's to replace the start/end character with the boundary character.有一种简单的方法可以在不使用前瞻/后视的情况下解决这个问题,那就是用边界字符替换开始/结束字符。 This lets you match against word boundaries which includes start/end.这使您可以匹配包括开始/结束在内的单词边界。 As such, the appropriate expression should be:因此,适当的表达应该是:

(?:[^x]|\\b)(x{n}|x{m})(?:[^x]|\\b)

As you can see here: https://regex101.com/r/oC5oJ4/2 .正如您在此处看到的: https : //regex101.com/r/oC5oJ4/2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM