为什么'[0-9] *'在我的Python正则表达式中不匹配'abc'，因为字符串中有零个或多个数字？

Question

Why does this regex: 为什么这个正则表达式：

>>> r = re.compile("[0-9]*", re.DEBUG)

match like this: 像这样匹配：

>>> m = r.search("abc")
>>> m.group()
''

I was hoping that it would match the entire string 'abc' since 'a' fulfills the condition viz match 0 digits, and then the greedy match would include the string 'abc' in its entirety. 我希望它能匹配整个字符串'abc'因为'a'满足条件即匹配0位数，然后贪婪的匹配将包括整个字符串'abc' 。

Answer 1

You searched for 0 or more digits. 您搜索了0位或更多位数。 It found 0 or more digits. 它找到了0位或更多位数。 The exact number of digits that it found was 0. Hence, the empty string. 它找到的确切位数是0.因此，空字符串。

Answer 2

Use a Negated Character Class 使用否定字符类

In your comment above, you say you expect [0-9]* to match abc because: 在上面的评论中，你说你期望[0-9]*匹配abc因为：

"abc" contains 0 digits. “abc”包含0位数。

You're misunderstanding what a character class is, and that it contains atoms. 你误解了一个字符类是什么，它包含了原子。 Yours is not currently a negative assertion. 你的目前不是负面的断言。

You could get a match with [^0-9]* if you don't precompile it. 如果您不预编译它，您可以与[^0-9]*匹配。 For example: 例如：

>>> import re
>>> re.search("[^0-9]*", "abc").group()
'abc'

This would perhaps fit your mental map, but thinking of negated character classes as "not containing a range" as opposed to "not containing any of the included characters" is probably going to lead you astray in the future. 这可能适合你的思维导图，但是将否定的字符类视为“不包含范围”而不是“不包含任何包含的字符”可能会导致你将来误入歧途。 YMMV. 因人而异。

Answer 3

You asked "find me zero or more digits", so it found you zero or more digits (zero; empty string). 你问“找我零或多个数字”，所以它找到零个或多个数字（零;空字符串）。

If you wanted "find me zero or more digits followed by zero or more other characters", you need to say that (with the .* pattern). 如果你想“找到零个或多个数字后跟零个或多个其他字符”，你需要说（使用.*模式）。 '[0-9]*' does not match 'abc' , because 'abc' includes characters (letters) not included in the requested expression. '[0-9]*' 与 'abc'不匹配，因为'abc'包含未包含在请求的表达式中的字符（字母）。

>>> r = re.compile('[0-9]*.*')  # Note the very important ".*" that matches everything!
>>> r.search('abc').group()
'abc'

The point is the word "match". 关键是“匹配”这个词。 If your expression does not contain [a representation of] a certain character (such as "a"), then it cannot possibly match a string that contains that character! 如果你的表达式不包含[某个字符的表示]（例如“a”），那么它不可能匹配包含该字符的字符串！ Your given expression matches only strings consisting of zero or more digits and nothing else . 您的给定表达式仅匹配由零个或多个数字组成的字符串，而不包含任 Therefore it clearly doesn't match 'abc' . 因此它显然与'abc'不匹配。

As Tigerhawk has mentioned in the comments, if the * in regular expressions meant "zero or more of the preceding pattern, or anything else ", it would be extraordinarily useless, as any pattern with a * in it would match all strings, all the time! 正如Tigerhawk在评论中提到的，如果正则表达式中的*表示“前面的模式中的零或更多， 或者其他任何东西 ”，那么它将是非常无用的，因为任何带有*模式都匹配所有字符串，所有时间！

Answer 4

becouse your regex looking only for digits and abc doesn't have any digits in it. 因为你的正则表达式只查找数字而abc没有任何数字。

in short your regex matches anything with digits and the empty string. 简而言之，你的正则表达式匹配任何数字和空字符串。

Answer 5

From the documentation , search() does the following: 从文档中， search()执行以下操作：

Scan through string looking for a location where this regular expression produces a match, and return a corresponding match object. 扫描字符串，查找此正则表达式生成匹配项的位置，并返回相应的匹配对象。 Return None if no position in the string matches the pattern; 如果字符串中没有位置与模式匹配，则返回None; note that this is different from finding a zero-length match at some point in the string. 请注意，这与在字符串中的某个点找到零长度匹配不同。

Thus, the fact that m is not None indicates that it found a match. 因此， m不是None的事实表明它找到了匹配。 The fact that m.group() returns '' shows what it matched . m.group()返回''的事实显示了它匹配的内容 。

为什么'[0-9] *'在我的Python正则表达式中不匹配'abc'，因为字符串中有零个或多个数字？

问题描述

5 个解决方案

解决方案1
5 2015-10-24 03:29:47

解决方案2
4 2015-10-24 03:41:07

Use a Negated Character Class 使用否定字符类

解决方案3
3 已采纳 2015-10-24 03:34:44

解决方案4
2 2015-10-24 03:20:22

解决方案5
1 2015-10-24 03:26:49

为什么&#39;[0-9] *&#39;在我的Python正则表达式中不匹配&#39;abc&#39;，因为字符串中有零个或多个数字？

问题描述

5 个解决方案

解决方案1 5 2015-10-24 03:29:47

解决方案2 4 2015-10-24 03:41:07

Use a Negated Character Class 使用否定字符类

解决方案3 3 已采纳 2015-10-24 03:34:44

解决方案4 2 2015-10-24 03:20:22

解决方案5 1 2015-10-24 03:26:49

为什么'[0-9] *'在我的Python正则表达式中不匹配'abc'，因为字符串中有零个或多个数字？

解决方案1
5 2015-10-24 03:29:47

解决方案2
4 2015-10-24 03:41:07

解决方案3
3 已采纳 2015-10-24 03:34:44

解决方案4
2 2015-10-24 03:20:22

解决方案5
1 2015-10-24 03:26:49