简体   繁体   English

为什么'[0-9] *'在我的Python正则表达式中不匹配'abc',因为字符串中有零个或多个数字?

[英]Why doesn't '[0-9]*' match 'abc' in my Python regular expression since there are zero or more digits in the string?

Why does this regex: 为什么这个正则表达式:

>>> r = re.compile("[0-9]*", re.DEBUG)

match like this: 像这样匹配:

>>> m = r.search("abc")
>>> m.group()
''

I was hoping that it would match the entire string 'abc' since 'a' fulfills the condition viz match 0 digits, and then the greedy match would include the string 'abc' in its entirety. 我希望它能匹配整个字符串'abc'因为'a'满足条件即匹配0位数,然后贪婪的匹配将包括整个字符串'abc'

You searched for 0 or more digits. 您搜索了0位或更多位数。 It found 0 or more digits. 它找到了0位或更多位数。 The exact number of digits that it found was 0. Hence, the empty string. 它找到的确切位数是0.因此,空字符串。

Use a Negated Character Class 使用否定字符类

In your comment above, you say you expect [0-9]* to match abc because: 在上面的评论中 ,你说你期望[0-9]*匹配abc因为:

"abc" contains 0 digits. “abc”包含0位数。

You're misunderstanding what a character class is, and that it contains atoms. 你误解了一个字符类是什么,它包含了原子。 Yours is not currently a negative assertion. 你的目前不是负面的断言。

You could get a match with [^0-9]* if you don't precompile it. 如果您不预编译它,您可以[^0-9]*匹配。 For example: 例如:

>>> import re
>>> re.search("[^0-9]*", "abc").group()
'abc'

This would perhaps fit your mental map, but thinking of negated character classes as "not containing a range" as opposed to "not containing any of the included characters" is probably going to lead you astray in the future. 这可能适合你的思维导图,但是将否定的字符类视为“不包含范围”而不是“不包含任何包含的字符”可能会导致你将来误入歧途。 YMMV. 因人而异。

You asked "find me zero or more digits", so it found you zero or more digits (zero; empty string). 你问“找我零或多个数字”,所以它找到零个或多个数字(零;空字符串)。

If you wanted "find me zero or more digits followed by zero or more other characters", you need to say that (with the .* pattern). 如果你想“找到零个或多个数字后跟零个或多个其他字符”,你需要 (使用.*模式)。 '[0-9]*' does not match 'abc' , because 'abc' includes characters (letters) not included in the requested expression. '[0-9]*' 'abc'匹配 ,因为'abc'包含未包含在请求的表达式中的字符(字母)。

>>> r = re.compile('[0-9]*.*')  # Note the very important ".*" that matches everything!
>>> r.search('abc').group()
'abc'

The point is the word "match". 关键是“匹配”这个词。 If your expression does not contain [a representation of] a certain character (such as "a"), then it cannot possibly match a string that contains that character! 如果你的表达式不包含[某个字符的表示](例如“a”),那么它不可能匹配包含该字符的字符串! Your given expression matches only strings consisting of zero or more digits and nothing else . 您的给定表达式仅匹配由零个或多个数字组成的字符串,而不包含任 Therefore it clearly doesn't match 'abc' . 因此它显然与'abc'不匹配。


As Tigerhawk has mentioned in the comments, if the * in regular expressions meant "zero or more of the preceding pattern, or anything else ", it would be extraordinarily useless, as any pattern with a * in it would match all strings, all the time! 正如Tigerhawk在评论中提到的,如果正则表达式中的*表示“前面的模式中的零或更多, 或者其他任何东西 ”,那么它将是非常无用的,因为任何带有*模式都匹配所有字符串,所有时间!

becouse your regex looking only for digits and abc doesn't have any digits in it. 因为你的正则表达式只查找数字而abc没有任何数字。

in short your regex matches anything with digits and the empty string. 简而言之,你的正则表达式匹配任何数字和空字符串。

From the documentation , search() does the following: 文档中search()执行以下操作:

Scan through string looking for a location where this regular expression produces a match, and return a corresponding match object. 扫描字符串,查找此正则表达式生成匹配项的位置,并返回相应的匹配对象。 Return None if no position in the string matches the pattern; 如果字符串中没有位置与模式匹配,则返回None; note that this is different from finding a zero-length match at some point in the string. 请注意,这与在字符串中的某个点找到零长度匹配不同。

Thus, the fact that m is not None indicates that it found a match. 因此, m不是None的事实表明它找到了匹配。 The fact that m.group() returns '' shows what it matched . m.group()返回''的事实显示了它匹配的内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM