正则表达式：在常量词之间拆分新行

Question

Given给定

Word1   content1 content1 content1
       content2 content2 content2
         
          content3 content3 content3
Word2

I want to extract as groups content1, content2 and content3.我想将 content1、content2 和 content3 提取为组。 Could you help to make a regex for that?你能帮忙做一个正则表达式吗？ I tried:我试过了：

Word1[\s:]*((?P<value>[^\n]+)\n)+Word2 with gms flags, but it didn't help. Word1[\s:]*((?P<value>[^\n]+)\n)+Word2带有 gms 标志，但它没有帮助。 I need regex for python re module.我需要 python re 模块的正则表达式。

Answer 1

You can use您可以使用

import re
text = "Word1   content1 content1 content1\n       content2 content2 content2\n          content3 content3 content3\nWord2"
match = re.search(r'Word1[\s:]*((?:.+\n)*)Word2', text)
if match:
    print([s.strip() for s in match.group(1).splitlines()])

See the Python and the regex demo .请参阅Python和正则表达式演示。

Output:输出：

['content1 content1 content1', 'content2 content2 content2', 'content3 content3 content3']

Details :详情：

Word1 - a Word1 string Word1 - Word1字符串
[\s:]* - zero or more whitespaces and colons [\s:]* - 零个或多个空格和冒号
((?:.+\n)*) - Group 1: zero or more repetitions of one or more chars other than line break chars as many as possible, followed with a newline char ((?:.+\n)*) - 第 1 组：除换行符以外的一个或多个字符的零次或多次重复，后跟换行符
Word2 - a Word2 string. Word2 - Word2字符串。

Then, if there is a match, [s.strip() for s in match.group(1).splitlines()] splits the Group 1 value into separate lines.然后，如果有匹配项， [s.strip() for s in match.group(1).splitlines()]将 Group 1 值拆分为单独的行。

An alternative solution using the PyPi regex library can be使用PyPi 正则表达式库的替代解决方案可以是

import regex
text = "Word1   content1 content1 content1\n       content2 content2 content2\n          content3 content3 content3\nWord2"
print( regex.findall(r'(?<=Word1[\s:]*(?s:.*?))\S(?:.*\S)?(?=(?s:.*?)\nWord2)', text) )

See the Python demo .请参阅Python 演示。 Details :详情：

(?<=Word1[\s:]*(?s:.*?)) - a positive lookbehind that requires a Word1 string, zero or more whitespaces or colons, and then any zero or more chars as few as possible immediately to the left of the current location (?<=Word1[\s:]*(?s:.*?)) - 需要一个Word1字符串、零个或多个空格或冒号，然后是尽可能少的零个或多个字符当前位置的左侧
\S(?:.*\S)? - a non-whhitespace char and then any zero or more chars other than line break chars as many as possible till the last non-whitespace char on the line - 一个非空白字符，然后是除换行符之外的任何零个或多个字符，直到行上的最后一个非空白字符
(?=(?s:.*?)\nWord2) - a positive lookahead that requires any zero or more chars as few as possible and then a newline char and Word2 word to the right of the current location. (?=(?s:.*?)\nWord2) - 一个正向前瞻，需要尽可能少的零个或多个字符，然后在当前位置右侧添加一个换行符和Word2字。

Answer 2

Better extract to group everything between 2 words and then split it with new line symbol.更好地提取以将 2 个单词之间的所有内容分组，然后用换行符将其拆分。

first_key = "Word1"
second_key = "Word2"
common_regex = r"{first_key}[\s:]*(?P<value>.+){second_key}"
regex = common_regex.format(first_key=first_key, second_key=second_key)
lines = [x.group("value").strip() for x in re.finditer(regex, text_piece, re.DOTALL)]
if lines:
    lines = lines[0].split("\n")
else:
    lines = []
print(lines)

正则表达式：在常量词之间拆分新行

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-07-07 15:11:51

解决方案2
0 2022-07-07 15:14:08

正则表达式：在常量词之间拆分新行

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-07-07 15:11:51

解决方案2 0 2022-07-07 15:14:08

解决方案1
1 已采纳 2022-07-07 15:11:51

解决方案2
0 2022-07-07 15:14:08