简体   繁体   English

正则表达式findall在python3中产生奇怪的结果

[英]Regex findall produces strange results in python3

I want to find all the docblocks of a string using python. 我想使用python查找字符串的所有文档块。 My first attempt was this: 我的第一次尝试是:

b = re.compile('\/\*(.)*?\*/', re.M|re.S)
match = b.search(string)
print(match.group(0))

And that worked, but as you'll notice yourself: it'll only print out 1 docblock, not all of them. 这样就可以了,但是您会注意到:它只会打印出1个文档块,而不是全部。

So I wanted to use the findall function, which says it would output all the matches, like this: 所以我想使用findall函数,该函数表示它将输出所有匹配项,如下所示:

b = re.compile('\/\*(.)*?\*/', re.M|re.S)
match = b.findall(string)
print(match)

But I never get anything useful, only these kinds of arrays: 但是我永远都不会得到有用的东西,只有这些类型的数组:

[' ', ' ', ' ', '\t', ' ', ' ', ' ', ' ', ' ', '\t', ' ', ' ', ' ']

The documentation does say it'll return empty strings, but I don't know how this can be useful. 该文档确实说它将返回空字符串,但是我不知道这怎么用。

您需要在捕获组中移动量化器:

b = re.compile('\/\*(.*?)\*/', re.M|re.S)

To expand a bit on Rohit Jain's (correct) answer, with the qualifier outside the parentheses you're saying "match (non-greedily) any number of the one character inside the parens, and capture that one character". 为了扩大罗希特耆那教的(正确)回答了一下,跟你说了括号的限定词“比赛(非贪婪)任意数量的括号内的一个字符,并捕捉到一个字符”。 In other words, it would match " " or "aaaaaa", but in "abcde" it would only match the "a". 换句话说,它将匹配“”或“ aaaaaa”,但在“ abcde”中将仅匹配“ a”。 (And since it's non-greedy, even in "aaaaaa" it would only match a single "a"). (并且由于它不是贪婪的,所以即使在“ aaaaaa”中也只能匹配一个“ a”)。 By moving the qualifier inside the parens (that is, (.*?) instead of what you had before) you're now saying "match any number of characters, and capture all of them". 通过将限定符移动到括号内(即,而不是以前的(.*?) ),您现在说的是“匹配任意数量的字符,并捕获所有字符”。

I hope this helps you understand what's going on a bit better. 我希望这可以帮助您了解发生了什么事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM