在文本之后匹配所有带有模式的行，直到模式匹配失败正则表达式

Question

I have a text:我有一段文字：


{{Verkleinerungsformen}}
:[1] [[Äpfelchen]], [[Äpfelein]], [[Äpflein]]

{{Oberbegriffe}}
:[1] [[Kernobst]], [[Obst]]; [[Frucht]]
:[4] [[Kot]]
:[7] [[Gut]]

{{Unterbegriffe}}
:[1] [[Augustapfel]], [[Bohnapfel]], [[Bratapfel]], [[Essapfel]], [[Fallapfel]],

I'm interested in extracting all items under {{Oberbegriffe}} that have the pattern [[Text]] including all lines until it reach another line that does not have :[NUMBER-HERE] at the begin我有兴趣提取{{Oberbegriffe}}下具有模式 [[Text]] 的所有项目，包括所有行，直到它到达另一行开始时没有:[NUMBER-HERE]

so in the above example it should return an array of these strings:所以在上面的例子中，它应该返回一个由这些字符串组成的数组：

Kernobst, Obst, Frucht, Kot, Gut

what I have tried is:我试过的是：

re.search(r'{{Oberbegriffe}}\n(?::?\n)?([^\n]+)', text)

But it matches only the full first line.但它只匹配完整的第一行。 It's ok if there is a way to extract all lines with the pattern and it retruns this string如果有一种方法可以提取所有带有模式的行并重新运行这个字符串，那就没问题了

:[1] [[Kernobst]], [[Obst]]; [[Frucht]]
:[4] [[Kot]]
:[7] [[Gut]]

Answer 1

You may extract the blocks using您可以使用提取块

(?m)^{{Oberbegriffe}}(?:\n:\[\d+].*)*

See the regex demo查看正则表达式演示

Then, use \[\[([^][]+)]] pattern to extract the values you need.然后，使用\[\[([^][]+)]]模式提取您需要的值。 See this regex demo .请参阅此正则表达式演示。

Regex details正则表达式详细信息

(?m) - an inline modifier, same as re.M / re.MULTILINE (?m) - 内联修饰符，与re.M / re.MULTILINE相同
^ - start of a line ^ - 行首
{{Oberbegriffe}} - literal text {{Oberbegriffe}} - 文字
(?:\n:\[\d+].*)* - 0 or more repetitions of a newline followed with :[ , then 1+ digits, ] , and then any 0 or more characters other than line break chars, as many as possible. (?:\n:\[\d+].*)* - 换行符的 0 次或多次重复，后跟:[ ，然后是 1+ 位， ] ，然后是除换行符以外的任何 0 个或更多字符，尽可能多尽可能。

The second regex - \[\[([^][]+)]] - matches [[ , then capturing group #1 matching any 1 or more chars other than [ and ] , and then ]] .第二个正则表达式 - \[\[([^][]+)]] - 匹配[[ ，然后捕获组 #1 匹配除[和]之外的任何 1 个或多个字符，然后是]] 。

In Python:在 Python 中：

with open(filepath, 'r') as fr:
  blocks = re.findall(r'^{{Oberbegriffe}}(?:\n:\[\d+].*)*', fr.read(), flags=re.M)
  print([re.findall(r'\[\[([^][]+)]]', block) for block in blocks])

在文本之后匹配所有带有模式的行，直到模式匹配失败正则表达式

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-31 14:31:30

在文本之后匹配所有带有模式的行，直到模式匹配失败正则表达式

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-31 14:31:30

解决方案1
1 已采纳 2020-05-31 14:31:30