I've been messing around with the lxml library for a little while and maybe I'm not understanding it correctly or I'm missing something but I can't seem to figure out how to edit the file after I catch a certain xpath and then be able to write that back out into xml while I'm parsing element by element.
Say we have this xml as an example:
<xml>
<items>
<pie>cherry</pie>
<pie>apple</pie>
<pie>chocolate</pie>
</items>
</xml>
What I would like to do while parsing is when I hit that xpath of "/xml/items/pie" is to add an element before pie, so it will turn out like this:
<xml>
<items>
<item id="1"><pie>cherry</pie></item>
<item id="2"><pie>apple</pie></item>
<item id="3"><pie>chocolate</pie></item>
</items>
</xml>
That output would need to be done by writing to a file line by line as I hit each tag and edit the xml at certain xpaths. I mean I could have it print the starting tag, the text, the attribute if it exists, and then the ending tag by hard coding certain parts, but that would be very messy and it be nice if there was a way to avoid that if possible.
Here's my guess code at this:
from lxml import etree
path=[]
count=0
context=etree.iterparse(file,events=('start','end'))
for event, element in context:
if event=='start':
path.append(element.tag)
if /'+'/'.join(path)=='/xml/items/pie':
itemnode=etree.Element('item',id=str(count))
itemnode.text=""
element.addprevious(itemnode)#Not the right way to do it of course
#write/print out xml here.
else:
element.clear()
path.pop()
Edit: Also, I need to run through fairly big files, so I have to use iterparse.
There is a more clean way to make modifications you need:
pie
elements item
element pie
element with item
replace(self, old_element, new_element)
Replaces a subelement with the element passed as second argument.
from lxml import etree
from lxml.etree import XMLParser, Element
data = """<xml>
<items>
<pie>cherry</pie>
<pie>apple</pie>
<pie>chocolate</pie>
</items>
</xml>"""
tree = etree.fromstring(data, parser=XMLParser())
items = tree.find('.//items')
for index, pie in enumerate(items.xpath('.//pie'), start=1):
item = Element('item', {'id': str(index)})
items.replace(pie, item)
item.append(pie)
print etree.tostring(tree, pretty_print=True)
prints:
<xml>
<items>
<item id="1"><pie>cherry</pie></item>
<item id="2"><pie>apple</pie></item>
<item id="3"><pie>chocolate</pie></item>
</items>
</xml>
Here's a solution using iterparse()
. The idea is to catch all tag "start" events, remember the parent ( items
) tag, then for every pie
tag create an item
tag and put the pie into it:
from StringIO import StringIO
from lxml import etree
from lxml.etree import Element
data = """<xml>
<items>
<pie>cherry</pie>
<pie>apple</pie>
<pie>chocolate</pie>
</items>
</xml>"""
stream = StringIO(data)
context = etree.iterparse(stream, events=("start", ))
for action, elem in context:
if elem.tag == 'items':
items = elem
index = 1
elif elem.tag == 'pie':
item = Element('item', {'id': str(index)})
items.replace(elem, item)
item.append(elem)
index += 1
print etree.tostring(context.root)
prints:
<xml>
<items>
<item id="1"><pie>cherry</pie></item>
<item id="2"><pie>apple</pie></item>
<item id="3"><pie>chocolate</pie></item>
</items>
</xml>
I would suggest you to use an XSLT template, as it seems to match better for this task. Initially XSLT is a little bit tricky until you get used to it, if all you want is to generate some output from an XML, then XSLT is a great tool.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.