[英]Confusion about repeating pattern in python Regex
I have confusion about repeating pattern in Python regular expression. 我对在Python正则表达式中重复模式感到困惑。 I read from the documentation that '*' means repeating zero to N times.
我从文档中了解到,“ *”表示重复零至N次。 Suppose I have a string
abc123def
. 假设我有一个字符串
abc123def
。 I want to find the position of the substring containing numeric characters, so I use the following code: 我想找到包含数字字符的子字符串的位置,因此我使用以下代码:
p = re.compile(r'[\d]*')
p.search('abc123def').span()
And it outputs (0,0)
If I change the regex to [\\d]+
, it outputs (3,6)
. 然后输出
(0,0)
如果我将正则表达式更改为[\\d]+
,则输出(3,6)
。
Why the regex r'[\\d]*'
doesn't work? 为什么正则表达式
r'[\\d]*'
不起作用? Thanks. 谢谢。
It does work. 确实有效。
[\\d]*
(BTW, brackets are unnecessary - \\d*
will do exactly the same) matches any sequence of digits, including 0 digits ie. [\\d]*
(顺便说一句,括号是不必要的- \\d*
作用完全相同)匹配任何数字序列, 包括0位数字,即。 an empty string . 空字符串 。 And empty string is matched anywhere, in particular at the beginning of the string.
空字符串将在任何地方匹配,尤其是在字符串的开头。 If you want a non-empty sequence of digits, use
\\d+
like you already did. 如果您想要一个非空的数字序列,请像以前一样使用
\\d+
。
它确实起作用,它在字符串的开头找到了一个零长度的字符串。
Another way to see what is happening is to use findall
: 查看正在发生的另一种方法是使用
findall
:
>>> re.findall(r'\d*', 'abc123def')
['', '', '', '123', '', '', '', '']
vs VS
>>> re.findall(r'\d+', 'abc123def')
['123']
Or visually with regex101 或视觉上使用regex101
The *
means 'zero or more' at the first opportunity. *
表示第一个机会为“零或更多”。 You have zero digits at the start of the string. 字符串开头的数字为零。 A match!
火柴! And that matches are every character in the string.
并且匹配项是字符串中的每个字符。
Use +
if you want to match a substring. 如果要匹配子字符串,请使用
+
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.