I am working on an python automation script where I want extract specific paragraph based on regex match but I am stuck on how to extract the paragraph. The following is an example showing my case:
Solution : (Consistent Pattern)
The paragraph I want to extract (Inconsistent Pattern)
Remote value: x (Consistent Pattern)
The following is the program that I am currently working on and it will be great if anyone could enlighten me!
import re
test= 'Solution\s:'
test1='Remote'
with open('<filepath>', 'r') as extract:
lines=extract.readlines()
for line in lines:
x = re.search(test, line)
y = re.search(test1, line)
if x is not y:
f4.write(line)
print('good')
else:
print('stop')
This can be easily done using regular expressions, for example:
import re
text = r"""
Solution\s:
The paragraph I
want to extract
Remote
Some useless text here
Solution\s:
Another paragraph
I want to
extract
Remote
"""
m = re.findall(r"Solution\\s:(.*?)Remote", text, re.DOTALL | re.IGNORECASE)
print(m)
Where text
represents some text of interest (read in from a file, for example) from which we wish to extract all portions between the sentinel patterns Solution\\s:
and Remote
. Here we use an IGNORECASE search so that the sentinel patterns are recognised even if spelt with different capitalization.
The above code outputs:
['\nThe paragraph I\nwant to extract\n', '\nAnother paragraph\nI want to\nextract\n']
Read the Python re library documentation at https://docs.python.org/3/library/re.html for more details.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.