简体   繁体   中英

Python regex match on length A or B?

Normally when doing a regex you can do [regex]{n} to indicate that you want the regex to apply to n characters. Or you can do {n,m} to mean n through m characters.

What about individually? For example if I wanted to do {4 or 8 or 12}?

Alternation will do the job

A{4}|A{8}|A{12}

But if A is a big regex you will be duplicating a lot which is not good. Don't some regex engines allow to define a sub regex and later reuse it. I'm interested if this exists, but I use .NET which does not support it inside the regex.

Of course nothing stands in the way by embedding a string variable a few times from the host languages in the regex.

Update 1

A{12}|A{8}|A{4} 

can match something different than

A{4}|A{8}|A{12}

The former one can be labeled as greedy, while the latter lazy.

The latter will match the first 4 A's in AAAAAAAA while the former will match 8 A's.

The default behavior of a quantifier is greedy but since you can't make this hand made construct lazy with a ? it just depends on what you need when choosing between the 2. If you embed it in a regex you sometimes want lazy behavior. Not embedded the former is more than likely what you intended.

{m, n} is just shorthand for repeated alternation. That is, A{4,5} is just short for AAAA|AAAAA . As Kevin points out in a comment, you may be able to represent an arbitrary set of lengths as a continues range of concatenations, but in general that's not possible. For example, any finite set of prime numbers (in unary notation) could be matched by a regular expression:

11|111|11111|1111111|11111111111   # Your hypothetical 1{2 or 3 or 5 or 7 or 11}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM