I have several files on which I should work on. The files are xml-files, but before " < ?xml version="1.0"? > ", there are some debugging and status lines coming from the command line. Since I'd like to pare the file, these lines must be removed. My question is: How is this possible? Preferably inplace, ie the filename stays the same.
Thanks for any help.
An inefficient solution would be to read the whole contents and find where this occurs:
fileName="yourfile.xml"
with open(fileName,'r+') as f:
contents=f.read()
contents=contents[contents.find("< ?xml version="1.0"? >"):]
f.seek(0)
f.write(contents)
f.truncate()
The file will now contain the original files contents from "< ?xml version="1.0"? >" onwards.
What about trimming the file headers as you read the file?
import xml.etree.ElementTree as et
with open("input.xml", "rb") as inf:
# find starting point
offset = 0
for line in inf:
if line.startswith('<?xml version="1.0"'):
break
else:
offset += len(line)
# read the xml file starting at that point
inf.seek(offset)
data = et.parse(inf)
(This assumes that the xml header starts on its own line, but works on my test file:
<!-- This is a line of junk -->
<!-- This is another -->
<?xml version="1.0" ?>
<abc>
<def>xy</def>
<def>hi</def>
</abc>
Since you say you have several files, using fileinput
might be better than open
. You can then do something like:
import fileinput
import sys
prolog = '< ?xml version="1.0"? >'
reached_prolog = False
files = ['file1.xml', 'file2.xml'] # The paths of all your XML files
for line in fileinput.input(files, inplace=1):
# Decide how you want to remove the lines. Something like:
if line.startswith(prolog) and not reached_prolog:
continue
else:
reached_prolog = True
sys.stdout.write(line)
Reading the docs for fileinput
should make things clearer.
PS This is just a quick response; I haven't ran/tested the code.
A solution with regexp:
import re
import shutil
with open('myxml.xml') as ifile, open('tempfile.tmp', 'wb') as ofile:
for line in ifile:
matches = re.findall(r'< \?xml version="1\.0"\? >.+', line)
if matches:
ofile.write(matches[0])
ofile.writelines(ifile)
break
shutil.move('tempfile.tmp', 'myxml.xml')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.