Python - 根据 2 个关键字拆分带有长字符串的列表

Question

我有一个带有长字符串的列表。 如何拆分字符串以将部分从“MyKeyword”提取到“My Data”。 这些词在我的列表中出现多次，所以我想根据这个拆分它，如果可能的话包括 MyKeyword 和 MyData

当前数据示例：

['MyKeyword This is my data MyData. MyKeyword and chunk of text here. Random text. MyData is this etc etc ']

所需的 output：

['MyKeyword This is my data', 'MyData.', 'MyKeyword and chunk of text here. Random text.','MyData is this etc etc ']

当前代码：


from itertools import groupby
#linelist = ["a", "b", "", "c", "d", "e", "", "a"]
split_at = "MyKeyword"
[list(g) for k, g in groupby(output2, lambda x: x != split_at) if k]

Answer 1

您可以使用正则表达式，在惰性模式下将MyKeyword中的所有文本匹配到MyData ：

>>> import re
>>> re.findall("MyKeyword.*?MyData\.?","MyKeyword This is my data, MyData. MyKeyword and chunk of text here. Random text. MyData is this etc etc ")
['MyKeyword This is my data, MyData.', 'MyKeyword and chunk of text here. Random text. MyData']

.*? 表示 0 到无限个字符，但处于惰性模式 ( *? )，即尽可能少；
\.? 表示可选期间。

编辑（根据新要求）：

您需要的正则表达式类似于

MyKeyword.*?(?= ?MyData|$)|MyData.*?(?= ?MyKeyword|$)

它从匹配MyKeyword (resp. MyData ) 的点开始，然后像上面一样捕获尽可能少的字符，直到到达MyData (resp. MyKeyword ) 或字符串的末尾。

的确：

| 是一个特殊字符，意思是“或”
$匹配字符串的结尾
? 是一个可选空间
(?=<expr>)被称为正向前瞻，它的意思是“跟随<expr> ”

Python - 根据 2 个关键字拆分带有长字符串的列表

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-01-27 11:42:54

Python - 根据 2 个关键字拆分带有长字符串的列表

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-01-27 11:42:54

解决方案1
3 已采纳 2021-01-27 11:42:54