简体   繁体   中英

Why is this regular expression not working ({m, n})?

Trying to understand regular expressions and I am on the repetitions part: {m, n} .

I have this code:

>>> p = re.compile('a{1}b{1, 3}')
>>> p.match('ab')
>>> p.match('abbb')

As you can see both the strings are not matching the pattern. Why is this happening?

逗号后不应有空格, {1}是多余的。

Try

p = re.compile('a{1}b{1,3}')

...and mind the space.

Remove the extra whitespace in b .

Change:

p = re.compile('a{1}b{1, 3}')

to:

p = re.compile('a{1}b{1,3}')
                        ^   # no whitespace

and all should be well.

You are seeing some re behaviour that is very "dark corner", nigh on a bug (or two).

# Python 2.7.1
>>> import re
>>> pat = r"b{1, 3}\Z"
>>> bool(re.match(pat, "bb"))
False
>>> bool(re.match(pat, "b{1, 3}"))
True
>>> bool(re.match(pat, "bb", re.VERBOSE))
False
>>> bool(re.match(pat, "b{1, 3}", re.VERBOSE))
False
>>> bool(re.match(pat, "b{1,3}", re.VERBOSE))
True
>>>

In other words, the pattern "b{1, 3}" matches the literal text "b{1, 3}" in normal mode, and the literal text "b{1,3}" in VERBOSE mode.

The "Law of Least Astonishment" would suggest either (1) the space in front of the 3 was ignored and it matched "b" , "bb" , or "bbb" as appropriate [preferable] or (2) an exception at compile time.

Looking at it another way: Two possibilities: (a) The person who writes "{1, 3}" is imbued with the spirit of PEP8 and believes it is prescriptive and applies everywhere (b) The person who writes that has tested re undocumented behaviour and actually wants to match the literal text "b{1, 3}" and perversely wants to use r"b{1, 3}" instead of explicitly escaping: r"b\\{1, 3}" . Seems to me that (a) is much more probable than (b), and re should act accordingly.

Yet another perspective: When the space is reached, it has already parsed { , a string of digits, and a comma ie well into the {m,n} "operator" ... to silently ignore an unexpected character and treat it as though it was literal text is mind-boggling, perlish, etc.

Update Bug report lodged.

不要在{}之间插入空格。

p = re.compile('a{1}b{1,3}')

You can compile the regex with VERBOSE flag, this means most whitespace in the regex would be ignored. I think this is a very good practice to describe complex regular expressions in a more readable manner.

See here for details...

Hope this helps...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM