简体   繁体   中英

Extract text available between two keywords and store that text into a list in python

In my text file, I have below contents:

fdsjhgjhg

fdshkjhk

Start

Good Morning

Hello World

End

dashjkhjk

dsfjkhk

What should I do I want to extract the text available between word 'start' and 'End' and store in a list ??

Here's one way to do it:

text = '''fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
'''
m = re.search(r"Start.*End", text, re.S)
if m is not None:
    print(m[0].split("\n")[1:-1])

The subscript [1:-1] peels off the Start and End lines.

You could also modify the re to just capture the portion between; assuming Start is immediately followed by a newline, and End immediately follows a newline:

m = re.search(r"Start\n(.*)\nEnd", text, re.S)
if m is not None:
    print(m[1].split("\n"))

Here we use m[1] to get the captured text.

See an answer that should resolve your problem. a_text.txt file include your text data. I have added two more lines for test purposes.

ttp_template = """
Start {{ _start_ }}
{{line1 | _line_ }}
End {{ _end_ }}
"""

from ttp import ttp
import json

def text_parser(data_to_parse): 
    parser = ttp(data=data_to_parse, template=ttp_template)
    parser.parse()

    # print result in JSON format
    results = parser.result(format='json')[0]
    #print(results)

    #print(results)

    #converting str to json. 
    result = json.loads(results)

    return(result)

with open("a_text.txt") as f:
    data_to_parse = f.read()

print(text_parser(data_to_parse))

Please see the output below:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM