In my text file, I have below contents:
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
What should I do I want to extract the text available between word 'start' and 'End' and store in a list
??
Here's one way to do it:
text = '''fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
'''
m = re.search(r"Start.*End", text, re.S)
if m is not None:
print(m[0].split("\n")[1:-1])
The subscript [1:-1]
peels off the Start
and End
lines.
You could also modify the re
to just capture the portion between; assuming Start
is immediately followed by a newline, and End
immediately follows a newline:
m = re.search(r"Start\n(.*)\nEnd", text, re.S)
if m is not None:
print(m[1].split("\n"))
Here we use m[1]
to get the captured text.
See an answer that should resolve your problem. a_text.txt
file include your text data. I have added two more lines for test purposes.
ttp_template = """
Start {{ _start_ }}
{{line1 | _line_ }}
End {{ _end_ }}
"""
from ttp import ttp
import json
def text_parser(data_to_parse):
parser = ttp(data=data_to_parse, template=ttp_template)
parser.parse()
# print result in JSON format
results = parser.result(format='json')[0]
#print(results)
#print(results)
#converting str to json.
result = json.loads(results)
return(result)
with open("a_text.txt") as f:
data_to_parse = f.read()
print(text_parser(data_to_parse))
Please see the output below:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.