[英]Match all lines with a pattern after a text until pattern matching failure regex
I have a text:我有一段文字:
{{Verkleinerungsformen}}
:[1] [[Äpfelchen]], [[Äpfelein]], [[Äpflein]]
{{Oberbegriffe}}
:[1] [[Kernobst]], [[Obst]]; [[Frucht]]
:[4] [[Kot]]
:[7] [[Gut]]
{{Unterbegriffe}}
:[1] [[Augustapfel]], [[Bohnapfel]], [[Bratapfel]], [[Essapfel]], [[Fallapfel]],
I'm interested in extracting all items under {{Oberbegriffe}}
that have the pattern [[Text]] including all lines until it reach another line that does not have :[NUMBER-HERE]
at the begin我有兴趣提取
{{Oberbegriffe}}
下具有模式 [[Text]] 的所有项目,包括所有行,直到它到达另一行开始时没有:[NUMBER-HERE]
so in the above example it should return an array of these strings:所以在上面的例子中,它应该返回一个由这些字符串组成的数组:
Kernobst, Obst, Frucht, Kot, Gut
what I have tried is:我试过的是:
re.search(r'{{Oberbegriffe}}\n(?::?\n)?([^\n]+)', text)
But it matches only the full first line.但它只匹配完整的第一行。 It's ok if there is a way to extract all lines with the pattern and it retruns this string
如果有一种方法可以提取所有带有模式的行并重新运行这个字符串,那就没问题了
:[1] [[Kernobst]], [[Obst]]; [[Frucht]]
:[4] [[Kot]]
:[7] [[Gut]]
You may extract the blocks using您可以使用提取块
(?m)^{{Oberbegriffe}}(?:\n:\[\d+].*)*
See the regex demo查看正则表达式演示
Then, use \[\[([^][]+)]]
pattern to extract the values you need.然后,使用
\[\[([^][]+)]]
模式提取您需要的值。 See this regex demo .请参阅此正则表达式演示。
Regex details正则表达式详细信息
(?m)
- an inline modifier, same as re.M
/ re.MULTILINE
(?m)
- 内联修饰符,与re.M
/ re.MULTILINE
相同^
- start of a line ^
- 行首{{Oberbegriffe}}
- literal text {{Oberbegriffe}}
- 文字(?:\n:\[\d+].*)*
- 0 or more repetitions of a newline followed with :[
, then 1+ digits, ]
, and then any 0 or more characters other than line break chars, as many as possible. (?:\n:\[\d+].*)*
- 换行符的 0 次或多次重复,后跟:[
,然后是 1+ 位, ]
,然后是除换行符以外的任何 0 个或更多字符,尽可能多尽可能。 The second regex - \[\[([^][]+)]]
- matches [[
, then capturing group #1 matching any 1 or more chars other than [
and ]
, and then ]]
.第二个正则表达式 -
\[\[([^][]+)]]
- 匹配[[
,然后捕获组 #1 匹配除[
和]
之外的任何 1 个或多个字符,然后是]]
。
In Python:在 Python 中:
with open(filepath, 'r') as fr:
blocks = re.findall(r'^{{Oberbegriffe}}(?:\n:\[\d+].*)*', fr.read(), flags=re.M)
print([re.findall(r'\[\[([^][]+)]]', block) for block in blocks])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.