Remove all rows of lines between two strings from xml file using python

Question

I'm writing a script that checks a short xml file with the following data:

Example:

<manager>
    <name="adi" lastName="kant">
        <info ip="10.12.180.107" platform="linux" user="root" password="dangerous"/>
    </name>
    <name="dino" lastName="kant">
        <info ip="10.12.180.108" platform="linux" user="root" password="dangerous"/>
    </name>
</manager>

I'm trying to create a python script that will look into this xml file and remove the selected name information.

Example:

removeData(xmlFile)
print xmlFile
<manager>
    <name="dino" lastName="kant">
        <info ip="10.12.180.108" platform="linux" user="root" password="dangerous"/>
    </name>
</manager>

The only solution I could come with was to read the file up to the name I wish to remove and then append that into one list, then read the file from the name after the name I wish to remove and append that to another list, combine those two lists and print that into my file.

Example:

h = open("/home/service/chimera/array_cert/test.txt", "r")
output = []
lines = h.readlines()
for line in lines:
    if '<name="adi"' in line: 
        break
    output.append(line)
i = 0
for line in lines:
    i+=1
    if '<name="dino"' in line:
        break
for line in lines[i:]:
    output.append(line)
h.close()
h = open("/home/service/chimera/array_cert/test.txt", "w")
h.truncate()
for line in output:
    h.write(line)

But this seems needlessly complex. Is there a simpler way to do this?

Also I'm using python 2.6 on a Linux system.

Answer 1

Use a SAX parser such as xml.sax . This gives you callbacks as it scans the XML file for each of the various xml tags or 'events' (ie opening a tag, closing a tag, seeing an attribute, seeing some data, etc). Keep track of whether you are in part of the XML file you do or do not want to keep (or delete) as you get these callbacks. Stream the data into a new file if you are in "keeping" mode, and don't otherwise.

When dealing with XML, always use a proper parser of some sort. The dangers of trying to use regexes or otherwise trying to do it yourself have been well documented .

Answer 2

Do you need to retain the same formatting as in source file? If not, probably you could just parse XML and output a new XML file.

If you can trust the XML in your source to have <name...> and </name> on different lines, you can modify your code just a little:

h = open("test1.txt", "r")
output = []
lines = h.readlines()
foroutput = 1
for line in lines:
    if '<name="adi"' in line:
        foroutput = 0
    if foroutput==1:
        output.append(line)
    elif '</name>' in line:
        foroutput = 1
h.close()
h = open("test2.txt", "w")
h.truncate()
for line in output:
    h.write(line)
h.close()

Remove all rows of lines between two strings from xml file using python

Question

2 answers

solution1
1 ACCPTED 2014-11-18 14:22:18

solution2
1 2014-11-18 14:29:40

Remove all rows of lines between two strings from xml file using python

Question

2 answers

solution1 1 ACCPTED 2014-11-18 14:22:18

solution2 1 2014-11-18 14:29:40

solution1
1 ACCPTED 2014-11-18 14:22:18

solution2
1 2014-11-18 14:29:40