python正则表达式：多线和非贪婪

Question

I have some text like this: 我有这样的文字：

cc.Action = {
};

cc.FiniteTimeAction = {

};

cc.Speed = {

};

And I the result (list) I want is: 而我想要的结果（列表）是：

['cc.Action = {}', 'cc.FiniteTimeAction = {}', 'cc.Speed = {}']

And here's what I have tried: 以下是我的尝试：

input = codecs.open(self.input_file, "r", "utf-8")
content = input.read()
result = re.findall('cc\..*= {.*};', content, re.S)
for r in result:
    print r
    print '---------------'

And the result is: 结果是：

[
'cc.Action = {
};

cc.FiniteTimeAction = {

};

cc.Speed = {

};'
]

Any suggestion will be appreciated, thanks :) 任何建议将不胜感激，谢谢:)

Answer 1

The beginning of the match seems to be cc. 比赛的开始似乎是cc. and the end of match seems to be ; 比赛结束似乎是; so we can use pattern: 所以我们可以使用模式：

'cc\.[^;]+'

Meaning, we match cc. 意思是，我们匹配cc. and then match every character which is not ; 然后匹配每个不是的字符; ( [] encloses character class, ^ negates the class). （ []包含字符类， ^否定类）。

You could also use non-greedy repeat *? 你也可以使用非贪婪的重复*? , but in this case I would say it's an overkill. ，但在这种情况下，我会说这是一个矫枉过正。 The simpler the regex is the better. 正则表达式越简单越好。

To get desired input you would also have to get rid of newlines. 要获得所需的输入，您还必须摆脱换行符。 Together I would propose: 我一起建议：

result = re.findall('cc\.[^;]*;', content.replace('\n', ''))

Answer 2

The problem is, you are using greedy search. 问题是，你正在使用贪婪的搜索。 You need to use non-greedy search with ? 你需要使用非贪婪的搜索? operator 操作者

import re
print [i.replace("\n", "") for i in re.findall(r"cc\..*?{.*?}", data, re.DOTALL)]
# ['cc.Action = {}', 'cc.FiniteTimeAction = {}', 'cc.Speed = {}']

If you don't use .*? 如果你不使用.*? , .*{ will match till the last { in the string. ， .*{将匹配到字符串中的最后一个{ 。 So, all the strings are considered as a single string. 因此，所有字符串都被视为单个字符串。 When you do non-greedy match, it matches till the first { from the current character. 当你进行非贪婪的比赛时，它会匹配到第一个{来自当前角色。

Also, this can be done without using RegEx, like this 此外，这可以在不使用RegEx的情况下完成，就像这样

print [item.replace("\n", "") for item in data.split(";") if item]
# ['cc.Action = {}', 'cc.FiniteTimeAction = {}', 'cc.Speed = {}']

Just split the string based on ; 只需基于分割字符串; and if the current string is not empty, then replace all the \\n (newline characters) with empty strings. 如果当前字符串不为空，则用空字符串替换所有\\n （换行符）。

Answer 3

As your title suggests, the issue is greediness: cc\\..*= matches from the beginning of the string to the last = . 正如你的标题所示，问题是贪婪： cc\\..*=从字符串的开头到最后的 =匹配。

You can avoid this behavior by using lazy quantifier that will try to stop at the earliest occurrence of the following character: 您可以通过使用延迟量词来避免此行为，该量词将尝试在最早出现的下一个字符时停止：

cc\..*?= {.*?};

Demo here: http://regex101.com/r/oL4yG7 . 在这里演示： http ： //regex101.com/r/oL4yG7 。

Answer 4

If you split based on ; 如果你基于分裂; : ：

codes.split(';')

Output: 输出：

['cc.Action = {}', ' cc.FiniteTimeAction = {}', 'cc.Speed = {}', '']

Answer 5

>>> 'cc.Action = {\n};\n\ncc.FiniteTimeAction = {\n\n};\n\ncc.Speed = {\n\n};'.replace('\n','').split(";")
['cc.Action = {}', 'cc.FiniteTimeAction = {}', 'cc.Speed = {}', '']

this will work for you 这对你有用

python正则表达式：多线和非贪婪

问题描述

5 个解决方案

解决方案1
1 2014-04-24 20:40:50

解决方案2
0 2014-04-03 09:41:18

解决方案3
0 2014-04-03 09:41:21

解决方案4
0 2014-04-03 09:46:07

解决方案5
0 2014-04-03 09:50:15

python正则表达式：多线和非贪婪

问题描述

5 个解决方案

解决方案1 1 2014-04-24 20:40:50

解决方案2 0 2014-04-03 09:41:18

解决方案3 0 2014-04-03 09:41:21

解决方案4 0 2014-04-03 09:46:07

解决方案5 0 2014-04-03 09:50:15

解决方案1
1 2014-04-24 20:40:50

解决方案2
0 2014-04-03 09:41:18

解决方案3
0 2014-04-03 09:41:21

解决方案4
0 2014-04-03 09:46:07

解决方案5
0 2014-04-03 09:50:15