简体   繁体   English

正则表达式从短语开始到带条件的文档结尾

[英]Regex starting from phrase to end of doc with condition

I have a starting phrase say fruits . 我有一个开始的话说fruits I have some ending phrase like apple , banana and pineapple . 我有一些结尾短语,例如applebananapineapple

I have some documents with variable as text : 我有一些text变量为text

  1. Fruits 水果

    They are good for health.... 它们对健康有益。

    should eat Apple 应该吃苹果

  2. Fruits 水果

    eat regularly banana 经常吃香蕉

    Fruits you need 你需要的水果

    to eat Apple 吃苹果

  3. Fruits are good 水果很好

    Daily we should have pineapple 每天我们应该吃菠萝

    In general, fruits have various minerals. 通常,水果具有多种矿物质。

    Most of them are very tasty 他们大多数都很好吃

My Regex and code: 我的正则表达式和代码:

p = r'(\bFruits\b\s*\w*\s*\n*.*?(\bApples?\b|\bbananas?\b|\bpineapples?\b))'
sep = ";;"
lst = re.findall(p, text, re.I|re.M|re.DOTALL)
val = sep.join(str(v) for v in lst )

Above regex works well in text 1 & 2 and partially in text 3. 上面的正则表达式在text 1和2以及部分text 3中效果很好。

Problem : 问题

All I need is when we encounter fruit and don't find any of the ending phrase, then and only then go till end of document. 我所需要的只是当我们遇到水果而找不到任何结尾短语时,然后直到文档结尾。

Expected Output from text 3 : text 3的 预期输出

Fruits are good Daily we should have pineapple ;; fruits have various minerals.
Most of them are very tasty

PS : I tried $ as well, but that was also not working. PS :我也尝试过$ ,但是那也不起作用。

include \\Z in the expression as follows 在表达式中包含\\Z ,如下所示

text = '''Fruits are good

Daily we should have pineapple

In general, Fruits have various minerals.

Most of them are very tasty
'''

p = r'(\bFruits\b\s*\w*\s*\n*.*?(\bApples?\b|\bbananas?\b|\bpineapples?\b|\Z))'
sep = ";;"
lst = re.findall(p, text, re.I|re.M|re.DOTALL)
val = sep.join(str(v) for v in lst )
print(val)

output is as follows 输出如下

('Fruits are good\\n\\nDaily we should have pineapple', 'pineapple');;('Fruits have various minerals.\\n\\nMost of them are very tasty\\n', '') [Finished in 0.1s]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 已知短语前后的正则表达式条件 - Regex condition after and before a known phrase 正则表达式可以匹配从文件末尾开始的2个符号之间的任何内容? - Regex to match anything between 2 symbols starting from the end of a file? 正则表达式匹配从字符串末尾开始的第一次出现 - Regex to match the first occurance starting from the end of the string 我需要遍历文本文件并从python 3开始并结束满足条件的位置打印特定文本 - I need to iterate over text file and print specific text starting from and end fulfilling condition in python 3 正则表达式:除了大写字母开头,而不是开头以外,从每个单词的末尾删除“ s”? - regex: remove 's' from the end of every word except starting from capital letter but not at the beginning? 使用正则表达式从给定的单词开始直到字符串的末尾(包括换行符)获取字符串的一部分 - Fetching a part of a string using regex starting from a given word untill the end of the string(one that includes newlines) Python Regex:如何从文本末尾开始搜索和拉取? - Python Regex: How do I search and pull starting from the end of the text? 用 python 替换“.doc”文件中的短语 - Replacing a phrase in a `.doc` file with python 从正则表达式中的字符开始跳过 - skip starting from a character in regex 如何从正则表达式中的字符串中找到不在单词边界的短语? - How to find a phrase that is NOT at a word boundary from a string in Regex?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM