简体   繁体   中英

Python to combine multiple lines in a txt file if certain criteria match

Could someone help me to combine multiple lines in txt file into a single line if text between tags is not in a single line already?

my.txt

<start>Hello World.</start>
<start>Hello World, this is my message.


Regards,

Jane

www.url.com

</start>

desired output.txt:

<start>Hello World.</start>
<start>Hello World, this is my message. Regards, Jane www.url.com</start>

my code so far:

f = open('/path/to/my.txt', 'r')
currentline = ""
for line in f:
    if line.startswith('<start>'):
        line = line.rstrip('\n')
        print(line)
    else:
        line = line.rstrip('\n')
        currentline = currentline + line
        print (currentline)

f.close()

output:

<start>Hello World.</start>
<start>Hello World, this is my message.


Regards,
Regards,
Regards,Jane
Regards,Jane
Regards,Janewww.url.com
Regards,Janewww.url.com
Regards,Janewww.url.com</start>

thank you in advance!

You can do something like this:

import re

with open('/path/to/my.txt', 'r') as fin:
    text = fin.read()

pattern = r"(<start>(.|\n)*?</start>)"
output = []
for utter in re.findall(pattern, text, re.MULTILINE):
    output.append(re.sub("\n+", ' ', utter[0]))
print(output)
#['<start>Hello World.</start>',
# '<start>Hello World, this is my message. Regards, Jane www.url.com </start>']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM