正则表达式从短语开始到带条件的文档结尾

Question

I have a starting phrase say fruits . 我有一个开始的话说fruits 。 I have some ending phrase like apple , banana and pineapple . 我有一些结尾短语，例如apple ， banana和pineapple 。

I have some documents with variable as text : 我有一些text变量为text ：

Fruits 水果

They are good for health.... 它们对健康有益。

should eat Apple 应该吃苹果
Fruits 水果

eat regularly banana 经常吃香蕉

Fruits you need 你需要的水果

to eat Apple 吃苹果
Fruits are good 水果很好

Daily we should have pineapple 每天我们应该吃菠萝

In general, fruits have various minerals. 通常，水果具有多种矿物质。

Most of them are very tasty 他们大多数都很好吃

My Regex and code: 我的正则表达式和代码：

p = r'(\bFruits\b\s*\w*\s*\n*.*?(\bApples?\b|\bbananas?\b|\bpineapples?\b))'
sep = ";;"
lst = re.findall(p, text, re.I|re.M|re.DOTALL)
val = sep.join(str(v) for v in lst )

Above regex works well in text 1 & 2 and partially in text 3. 上面的正则表达式在text 1和2以及部分text 3中效果很好。

Problem : 问题：

All I need is when we encounter fruit and don't find any of the ending phrase, then and only then go till end of document. 我所需要的只是当我们遇到水果而找不到任何结尾短语时，然后直到文档结尾。

Expected Output from text 3 : text 3的 预期输出 ：

Fruits are good Daily we should have pineapple ;; fruits have various minerals.
Most of them are very tasty

PS : I tried $ as well, but that was also not working. PS ：我也尝试过$ ，但是那也不起作用。

Answer 1

include \\Z in the expression as follows 在表达式中包含\\Z ，如下所示

text = '''Fruits are good

Daily we should have pineapple

In general, Fruits have various minerals.

Most of them are very tasty
'''

p = r'(\bFruits\b\s*\w*\s*\n*.*?(\bApples?\b|\bbananas?\b|\bpineapples?\b|\Z))'
sep = ";;"
lst = re.findall(p, text, re.I|re.M|re.DOTALL)
val = sep.join(str(v) for v in lst )
print(val)

output is as follows 输出如下

('Fruits are good\\n\\nDaily we should have pineapple', 'pineapple');;('Fruits have various minerals.\\n\\nMost of them are very tasty\\n', '') [Finished in 0.1s]

正则表达式从短语开始到带条件的文档结尾

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-05-03 10:07:51

正则表达式从短语开始到带条件的文档结尾

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-05-03 10:07:51

解决方案1
1 已采纳 2019-05-03 10:07:51