简体   繁体   中英

Find string between two substrings AND between string and the end of file

I have the following problem. I want to take specific strings from a multiple text file, there is a certain pattern in the text files. for example

example_file = "this is a test Pear this should be included1 Apple this should not be included Pear this should be included2 Apple again this should not be included Pear this should be included3"

Each file is very different, but in all the file I want the text 1: between the words 'Pear' and 'Apple' i have solved this with the following code:

x = re.findall(r'Pear+\s(.*?)Apple', example_file ,re.DOTALL)

which returns:

['this should be included1 ', 'this should be included2 ']

The think which i can not find is that i also want the string on the end, the 'this should be included3' part. So i was wondering if there is a way to specify with regex something like

 x = re.findall(r'Pear+\s(.*?)Apple OR EOF', example_file ,re.DOTALL)

so how can a match something between the word 'Pear' and EOF (end of file) ? Notice that these are all text files (so not specificly one sentence)

Select either Apple or $ (an anchor matching the end of the string):

x = re.findall(r'Pear\s+(.*?)(?:Apple|$)', example_file, re.DOTALL)

| specifies two alternatives, and (?:...) is a non-capturing group, so that the parser knows to pick either Apple or $ as the match.

Note that I replaced Pear+\\s with Pear\\s+ , as I suspect you want to match arbitrary whitespace, not an arbitrary number of r characters.

Demo:

>>> import re
>>> example_file = "this is a test Pear this should be included1 Apple this should not be included Pear this should be included2 Apple again this should not be included Pear this should be included3"
>>> re.findall(r'Pear\s+(.*?)(?:Apple|$)', example_file, re.DOTALL)
['this should be included1 ', 'this should be included2 ', 'this should be included3']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM