I have confusion about repeating pattern in Python regular expression. I read from the documentation that '*' means repeating zero to N times. Suppose I have a string abc123def
. I want to find the position of the substring containing numeric characters, so I use the following code:
p = re.compile(r'[\d]*')
p.search('abc123def').span()
And it outputs (0,0)
If I change the regex to [\\d]+
, it outputs (3,6)
.
Why the regex r'[\\d]*'
doesn't work? Thanks.
It does work. [\\d]*
(BTW, brackets are unnecessary - \\d*
will do exactly the same) matches any sequence of digits, including 0 digits ie. an empty string . And empty string is matched anywhere, in particular at the beginning of the string. If you want a non-empty sequence of digits, use \\d+
like you already did.
它确实起作用,它在字符串的开头找到了一个零长度的字符串。
Another way to see what is happening is to use findall
:
>>> re.findall(r'\d*', 'abc123def')
['', '', '', '123', '', '', '', '']
vs
>>> re.findall(r'\d+', 'abc123def')
['123']
Or visually with regex101
The *
means 'zero or more' at the first opportunity. You have zero digits at the start of the string. A match! And that matches are every character in the string.
Use +
if you want to match a substring.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.