简体   繁体   English

python匹配多行

[英]python match over multiple lines

I am trying to extract a portion of a multi-line string. 我正在尝试提取多行字符串的一部分。 Specifically, I'd like to pull out the list of terms between the center pair of braces: 具体来说,我想拉出中间一对大括号之间的术语列表:

'my datagroup 2.5 {\n    nose-capabilities {\n        none\n        slow\n        800\n        1200\n    }\n}\n'

I've tried this: 我已经试过了:

re.findall('.*{.*{(?:\s*(\S+)\s*)*}\s*}', d, re.S)

# ['1200']

So I'm only getting the last capture as far as I can tell. 因此,据我所知,我只能获得最后的捕获。 ?

If you are trying to do this using regex, you would be better off using re.search and a lookahead assertion. 如果您尝试使用正则表达式执行此操作,则最好使用re.search和先行断言。

>>> re.search(r'(?s){(?!.*{)([^}]*)', d).group(1).split()
['none', 'slow', '800', '1200']

I think you're missing the point of findall . 我认为您错过了findall It returns one value for every match of the entire pattern. 对于整个模式的每次匹配,它都返回一个值。 If you want multiple groups in one pattern, that's fine, but you don't need findall for that. 如果您希望在一个模式中有多个组,那很好,但是您不需要findall

In fact, you really don't need that. 实际上,您实际上不需要。 You can just replace the whole middle of your pattern with (.*?) to trivially capture everything between the second open brace and the first close brace. 您可以仅用(.*?)替换模式的整个中间部分,以轻松捕获第二个开括号和第一个闭括号之间的所有内容。

Note the non-greedy match; 注意非贪婪的匹配; otherwise, it would suck up everything up to the last close brace, instead of up to the first . 否则,它将吸收所有内容直到最后一个大括号,而不是第一个 (You could use a lookahead assertion for this, but non-greedy matches are simpler.) (您可以为此使用前瞻性断言,但非贪心匹配更简单。)

>>> re.findall('.*{.*{(.*?)}', d, re.S)
['\n        none\n        slow\n        800\n        1200\n    ']

Although of course findall still isn't doing you any good: 尽管findall当然仍然不能为您带来任何好处:

>>> re.search('.*{.*{(.*?)}', d, re.S).group(1)
'\n        none\n        slow\n        800\n        1200\n    '

Anyway, once you have that, you can just split it: 无论如何,一旦有了,就可以splitsplit

>>> re.search('.*{.*{(.*?)}', d, re.S).group(1).split()
['none', 'slow', '800', '1200']

Answer for any number of nested braces: 回答任意数量的嵌套括号:

s = 'my datagroup 2.5 {\n    nose-capabilities {\n        none\n        slow\n        800\n        1200\n    }\n}\n'
# the position of the inner brace
start = s.rfind('{') + 1
end = s.find('}')
# find alphanumeric characters in substring
re.findall('\w+', s[start:end])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM