[英]Regex starting from phrase to end of doc with condition
I have a starting phrase say fruits
. 我有一个开始的话说fruits
。 I have some ending phrase like apple
, banana
and pineapple
. 我有一些结尾短语,例如apple
, banana
和pineapple
。
I have some documents with variable as text
: 我有一些text
变量为text
:
Fruits 水果
They are good for health.... 它们对健康有益。
should eat Apple 应该吃苹果
Fruits 水果
eat regularly banana 经常吃香蕉
Fruits you need 你需要的水果
to eat Apple 吃苹果
Fruits are good 水果很好
Daily we should have pineapple 每天我们应该吃菠萝
In general, fruits have various minerals. 通常,水果具有多种矿物质。
Most of them are very tasty 他们大多数都很好吃
My Regex and code: 我的正则表达式和代码:
p = r'(\bFruits\b\s*\w*\s*\n*.*?(\bApples?\b|\bbananas?\b|\bpineapples?\b))'
sep = ";;"
lst = re.findall(p, text, re.I|re.M|re.DOTALL)
val = sep.join(str(v) for v in lst )
Above regex works well in text
1 & 2 and partially in text
3. 上面的正则表达式在text
1和2以及部分text
3中效果很好。
Problem : 问题 :
All I need is when we encounter fruit and don't find any of the ending phrase, then and only then go till end of document. 我所需要的只是当我们遇到水果而找不到任何结尾短语时,然后直到文档结尾。
Expected Output from text
3 : text
3的 预期输出 :
Fruits are good Daily we should have pineapple ;; fruits have various minerals.
Most of them are very tasty
PS : I tried $
as well, but that was also not working. PS :我也尝试过$
,但是那也不起作用。
include \\Z
in the expression as follows 在表达式中包含\\Z
,如下所示
text = '''Fruits are good
Daily we should have pineapple
In general, Fruits have various minerals.
Most of them are very tasty
'''
p = r'(\bFruits\b\s*\w*\s*\n*.*?(\bApples?\b|\bbananas?\b|\bpineapples?\b|\Z))'
sep = ";;"
lst = re.findall(p, text, re.I|re.M|re.DOTALL)
val = sep.join(str(v) for v in lst )
print(val)
output is as follows 输出如下
('Fruits are good\\n\\nDaily we should have pineapple', 'pineapple');;('Fruits have various minerals.\\n\\nMost of them are very tasty\\n', '') [Finished in 0.1s]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.