Python正则表达式匹配，直到识别后的某些单词

Question

Given the following string or similar: 给定以下字符串或类似内容：

baz: bar
key: >
   lorem ipsum 1213 __ ^123   
   lorem ipsum

foo:bar
anotherkey: >
   lorem ipsum 1213 __ ^123   
   lorem ipsum

I am trying to build a REGEX which captures all values after a key followed by a > sign. 我试图建立一个正则表达式，它捕获一个键后跟一个>符号后的所有值。

So for the above example, I want to match from key to foo (excluding) and then from anotherkey to the end. 因此对于上面的示例，我想从key到foo （不包括）匹配，然后从anotherkey到末尾anotherkey 。 I managed to come up with a REGEX which does the job, but only if I know the name of foo : 我设法提出一个可完成此工作的REGEX，但前提是我知道foo的名称：

\w+:\s>\n\s+[\S+\s+]+(?=foo)

But this is not really a good solution. 但这并不是一个好的解决方案。 If I remove ?=foo then the match will include everything to the end of the string. 如果删除?=foo则匹配项将包含字符串的所有内容。 How can I fix this regex to do the match the values after > as described? 我如何解决此正则表达式以匹配>后所述的值？

Answer 1

(As per request ;) （按要求 ;）

You could use something like 您可以使用类似

^\w+:\s*>\n(?:[ \t].*\n?)+

(This is without the groups. If you decide you wan't them, see the comments to the question.) （这没有小组。如果您决定不参加，请参阅问题的注释。）

It matches the start of a line ( ^ ) followed by at least one word character ( \\w AZ, az, 0-9 or '-'. Could be changed to [az] if only lower case alphas should be allowed). 它与行（ ^ ）的开头匹配，后接至少一个单词字符（ \\w AZ，az，0-9或'-'。如果只允许使用小写字母，则可以更改为[az] ）。

Then it matches optional spaces ( \\s* ) followed by the > key-terminator and a line feed ( \\n ). 然后，它匹配可选的空格（ \\s* ），后跟> 键终止符和换行符（ \\n ）。

Then a non-capturing group ( (?: ) matching: 然后是一个非捕获组（ (?: ：）匹配：

a space or a tab 空格或制表符
followed by any character up to a line feed 随后是任意字符，直到换行符
an optional line feed 可选的换行

This group (matching an indented line) can be repeated any number of times (but must exist at least once - )+ ). 该组（与缩进线匹配）可以重复任意次（但必须至少存在一次- )+ ）。

See it here at regex101 . 在regex101上看到它。

Answer 2

You can tweak your regex to this: 您可以将正则表达式调整为：

(\w+:\s+>\n\s+[\S\s]+?)(?=\n\w+:\w+\n|\Z)

RegEx Demo 正则演示

Lookahead (?=\\n\\w+:\\w+\\n|\\Z) will assert presence of key:value or end of input ( \\Z ) after your non-greedy match. 在非贪婪匹配之后，先行(?=\\n\\w+:\\w+\\n|\\Z)会断言key:value或输入结尾（ \\Z ）的存在。

Alternatively this better performing regex can be used (thanks to Wiktor for the helpful comments below): 另外，也可以使用性能更好的正则表达式（感谢Wiktor提供以下有用的注释）：

\w+:\s+>\n(.*(?:\n(?!\n\w+:\w+\n).*)+)

RegEx Demo 2 RegEx演示2

Answer 3

One 一

If you are not sure about indentations whether or not they exist, then this is the simplest way you can achieve desired result: 如果不确定缩进是否存在，那么这是获得所需结果的最简单方法：

^\w+:\s+>(?:\s?[^:]*$)*

Live demo 现场演示

Explanation: 说明：

^               # Start of line
\w+:\s+>        # Match specific block
(?:             # Start of non-capturing group (a)
    \s?             # Match a newline
    [^:]*$          # Match rest of line if only it doesn't have a :
)*              # End of non-capturing group (a) (zero or more times - greedy)

You need m flag to be on as demonstrated in live demo. 如现场演示中所示，您需要打开m标志。

Two - the simplest 二-最简单

If leading white-spaces are always there, then you can go with this safer regex: 如果前导空格始终存在，那么可以使用此更安全的正则表达式：

^\w+:\s+>(?:\s?[\t ]+.*)*

Live demo 现场演示

m modifier should be set here as well. m修饰符也应在此处设置。

Python正则表达式匹配，直到识别后的某些单词

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-08-24 11:41:54

解决方案2
1 2016-08-24 10:13:26

解决方案3
0 2016-08-24 11:12:14

One 一

Two - the simplest 二-最简单

Python正则表达式匹配，直到识别后的某些单词

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-08-24 11:41:54

解决方案2 1 2016-08-24 10:13:26

解决方案3 0 2016-08-24 11:12:14

One 一

Two - the simplest 二-最简单

解决方案1
2 已采纳 2016-08-24 11:41:54

解决方案2
1 2016-08-24 10:13:26

解决方案3
0 2016-08-24 11:12:14