Remove multiple lines from a text file after a specific string, then replace with new text

Question

I'm trying to write a script that can read several .xml files within a directory When a specific string is found (every file has this script), I need it to delete all content after that string and replace it all with new content (this can be pulled in from another file if that's easier).

There are numerous lines being deleted/written here.

At the moment I am manually going through the files and deleting all text after the string, then saving the files and running this python script:

import fileinput
import sys
import os

os.chdir("F:\Desktop\PyTest")
rootdir='F:\Desktop\PyTest'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        f.close()
        f=open(file, 'a')
        f.write("\n      <Text>Lorem Ipsum</Text>")
        f.write("\n      <Text>Lorem Ipsum</Text>")
        f.write("\n      <Text>Lorem Ipsum</Text>")
        f.write("\n      <Text>Lorem Ipsum</Text>")
    f.close

It took me a while to piece this together from tutorials, and although I've managed to find tutorials to search for a specific string and to replace it, I haven't been able to erase all content after a string and replace with new.

Any advice would be greatly appreciated :)

Doesn't have to be in Python, but I am running a Windows environment.

Answer 1

This is notably not the fastest implementation for large files, but should work.

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        output=[]
        with open(file, 'r') as inF:
            for line in inF:
                output.append(line)
                if 'criteria' in line: break
        f=open(file, 'a')
        Lorem_list=['Lorem Ipsum','Lorem Ipsum','Lorem Ipsum']
        #The '\n' may look strange, but I am using your previous syntax.
        #This also will result in a blank line. I would suggest revising the
        #way you place text to follow the (x+'\n') format.
        [f.write(x) for x in output]
        [f.write('\n      '+x) for x in Lorem_list]
        f.close()

Replace 'criteria' for the string you are looking for.

To be clear:

Lorem_list=['Lorem Ipsum','Lorem Ipsum','Lorem Ipsum']
[f.write('\n      '+x) for x in Lorem_list]

Means:

    f.write("\n      Lorem Ipsum")
    f.write("\n      Lorem Ipsum")
    f.write("\n      Lorem Ipsum")

Answer 2

If you want to delete everything after a specific string regex sounds to me like the thing for you! Something along the lines of this:

import re

def trim(test_string, removal_string):
    return re.sub(r'^(.*?)('+ removal_string + ')(.*)$', r'\1' + r'\2', test_string)

example = "I want to remove everything after quips, this for instance is useless"
print trim(example, 'quips')

returns "I want to remove everything after quips

"

Hope that helps

If you want to do this on a file then you can call the above code like this:

def cleanFile(file_path):
    with open(file_path) as master_text:
        return trim(master_text)

Simple as that. If you can write the open line slightly more verbosely as

with open(file_path, 'r').read() as master_text:

If you want it a little more clear, but it should do the same thing.

Answer 3

If you're editing XML you want to check out an XML editor like Beautiful Soup .

As far as what you asked, suppose this were our string:

str = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum libero sem, 
sollicitudin nec bibendum nec, condimentum sed magna. Duis malesuada, mi vel aliquet auctor, 
mi dui molestie massa, ac dapibus velit justo ut lorem. Donec fermentum euismod elementum. 
Etiam et ligula nisi, in porta lacus. 0 Nam laoreet, ligula pretium facilisis eleifend, 
purus dolor commodo nisi, eget iaculis dolor arcu eu neque. Integer sit amet blandit est. In 
eu ipsum nec turpis sagittis tincidunt"""

and you wanted to replace everything after the 0 new stuff.

new_stuff = '''
               No breeze, O majestic nose, can give thee cold - save when the north 
               winds blow.
            '''

better_string = str[0:str.index('0')] + new_stuff

Remove multiple lines from a text file after a specific string, then replace with new text

Question

3 answers

solution1
1 ACCPTED 2013-01-04 22:50:51

solution2
0 2013-01-04 22:48:11

solution3
0 2013-01-04 22:54:50

Remove multiple lines from a text file after a specific string, then replace with new text

Question

3 answers

solution1 1 ACCPTED 2013-01-04 22:50:51

solution2 0 2013-01-04 22:48:11

solution3 0 2013-01-04 22:54:50

solution1
1 ACCPTED 2013-01-04 22:50:51

solution2
0 2013-01-04 22:48:11

solution3
0 2013-01-04 22:54:50