简体   繁体   English

Python正则表达式匹配长度A或B?

[英]Python regex match on length A or B?

Normally when doing a regex you can do [regex]{n} to indicate that you want the regex to apply to n characters. 通常,在进行正则表达式时,您可以执行[regex] {n}来表示您希望正则表达式应用于n个字符。 Or you can do {n,m} to mean n through m characters. 或者,您可以执行{n,m}来表示n到m个字符。

What about individually? 那个人呢? For example if I wanted to do {4 or 8 or 12}? 例如,如果我想做{4或8或12}?

Alternation will do the job 交替将完成工作

A{4}|A{8}|A{12}

But if A is a big regex you will be duplicating a lot which is not good. 但是,如果A是一个很大的正则表达式,那么您将重复很多,这不好。 Don't some regex engines allow to define a sub regex and later reuse it. 某些正则表达式引擎不允许定义子正则表达式并在以后重用。 I'm interested if this exists, but I use .NET which does not support it inside the regex. 我对此是否存在很感兴趣,但是我使用的.NET在正则表达式中不支持它。

Of course nothing stands in the way by embedding a string variable a few times from the host languages in the regex. 当然,从宿主语言中多次将字符串变量嵌入到正则表达式中并没有什么障碍。

Update 1 更新1

A{12}|A{8}|A{4} 

can match something different than 可以匹配不同于

A{4}|A{8}|A{12}

The former one can be labeled as greedy, while the latter lazy. 前者可以被标记为贪婪,而后者则是懒惰的。

The latter will match the first 4 A's in AAAAAAAA while the former will match 8 A's. 后者将匹配AAAAAAAA中的前4个A,而前者将匹配8个A。

The default behavior of a quantifier is greedy but since you can't make this hand made construct lazy with a ? 量词的默认行为是贪婪,但是由于您不能使此手工构造的构造变得懒惰? it just depends on what you need when choosing between the 2. If you embed it in a regex you sometimes want lazy behavior. 它仅取决于您在2之间进行选择时的需求。如果将其嵌入到正则表达式中,则有时会出现懒惰行为。 Not embedded the former is more than likely what you intended. 不嵌入前者很有可能是您想要的。

{m, n} is just shorthand for repeated alternation. {m, n}只是重复交替的简写。 That is, A{4,5} is just short for AAAA|AAAAA . 也就是说, A{4,5}只是AAAA|AAAAA缩写。 As Kevin points out in a comment, you may be able to represent an arbitrary set of lengths as a continues range of concatenations, but in general that's not possible. 正如凯文(Kevin)在评论中指出的那样,您可以将任意长度表示为连续的一系列范围,但总的来说是不可能的。 For example, any finite set of prime numbers (in unary notation) could be matched by a regular expression: 例如,任何有限的素数集(以一元表示法)都可以由正则表达式匹配:

11|111|11111|1111111|11111111111   # Your hypothetical 1{2 or 3 or 5 or 7 or 11}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM