[英]python match over multiple lines
I am trying to extract a portion of a multi-line string. 我正在尝试提取多行字符串的一部分。 Specifically, I'd like to pull out the list of terms between the center pair of braces:
具体来说,我想拉出中间一对大括号之间的术语列表:
'my datagroup 2.5 {\n nose-capabilities {\n none\n slow\n 800\n 1200\n }\n}\n'
I've tried this: 我已经试过了:
re.findall('.*{.*{(?:\s*(\S+)\s*)*}\s*}', d, re.S)
# ['1200']
So I'm only getting the last capture as far as I can tell. 因此,据我所知,我只能获得最后的捕获。 ?
?
I think you're missing the point of findall
. 我认为您错过了
findall
。 It returns one value for every match of the entire pattern. 对于整个模式的每次匹配,它都返回一个值。 If you want multiple groups in one pattern, that's fine, but you don't need
findall
for that. 如果您希望在一个模式中有多个组,那很好,但是您不需要
findall
。
In fact, you really don't need that. 实际上,您实际上不需要。 You can just replace the whole middle of your pattern with
(.*?)
to trivially capture everything between the second open brace and the first close brace. 您可以仅用
(.*?)
替换模式的整个中间部分,以轻松捕获第二个开括号和第一个闭括号之间的所有内容。
Note the non-greedy match; 注意非贪婪的匹配; otherwise, it would suck up everything up to the last close brace, instead of up to the first .
否则,它将吸收所有内容直到最后一个大括号,而不是第一个 。 (You could use a lookahead assertion for this, but non-greedy matches are simpler.)
(您可以为此使用前瞻性断言,但非贪心匹配更简单。)
>>> re.findall('.*{.*{(.*?)}', d, re.S)
['\n none\n slow\n 800\n 1200\n ']
Although of course findall
still isn't doing you any good: 尽管
findall
当然仍然不能为您带来任何好处:
>>> re.search('.*{.*{(.*?)}', d, re.S).group(1)
'\n none\n slow\n 800\n 1200\n '
Anyway, once you have that, you can just split
it: 无论如何,一旦有了,就可以
split
其split
:
>>> re.search('.*{.*{(.*?)}', d, re.S).group(1).split()
['none', 'slow', '800', '1200']
Answer for any number of nested braces: 回答任意数量的嵌套括号:
s = 'my datagroup 2.5 {\n nose-capabilities {\n none\n slow\n 800\n 1200\n }\n}\n'
# the position of the inner brace
start = s.rfind('{') + 1
end = s.find('}')
# find alphanumeric characters in substring
re.findall('\w+', s[start:end])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.