简体   繁体   English

python lxml使用iterparse来编辑和输出xml

[英]python lxml using iterparse to edit and output xml

I've been messing around with the lxml library for a little while and maybe I'm not understanding it correctly or I'm missing something but I can't seem to figure out how to edit the file after I catch a certain xpath and then be able to write that back out into xml while I'm parsing element by element. 我一直在搞乱lxml库一段时间,也许我没有正确理解它或者我错过了一些东西但我似乎无法弄清楚如何在捕获某个xpath后编辑文件然后能够在我逐个元素地解析时将其写回xml。

Say we have this xml as an example: 假设我们以xml为例:

<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>

What I would like to do while parsing is when I hit that xpath of "/xml/items/pie" is to add an element before pie, so it will turn out like this: 我在解析时想要做的是当我点击“/ xml / items / pie”的xpath时是在pie之前添加一个元素,所以它会像这样:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
  </items>
</xml>

That output would need to be done by writing to a file line by line as I hit each tag and edit the xml at certain xpaths. 当我点击每个标签并在某些xpath处编辑xml时,需要通过逐行写入文件来完成该输出。 I mean I could have it print the starting tag, the text, the attribute if it exists, and then the ending tag by hard coding certain parts, but that would be very messy and it be nice if there was a way to avoid that if possible. 我的意思是我可以让它打印起始标签,文本,属性(如果它存在),然后通过硬编码某些部分来结束标签,但这将是非常混乱的,如果有办法避免如果可能。

Here's my guess code at this: 这是我的猜测代码:

from lxml import etree

path=[]
count=0

context=etree.iterparse(file,events=('start','end'))
for event, element in context:
    if event=='start':
       path.append(element.tag)
       if /'+'/'.join(path)=='/xml/items/pie':
          itemnode=etree.Element('item',id=str(count))
          itemnode.text=""
          element.addprevious(itemnode)#Not the right way to do it of course
          #write/print out xml here.
    else:
        element.clear()
        path.pop()

Edit: Also, I need to run through fairly big files, so I have to use iterparse. 编辑:另外,我需要运行相当大的文件,所以我必须使用iterparse。

There is a more clean way to make modifications you need: 有一种更简洁的方法可以进行所需的修改:

  • iterate over pie elements 迭代pie元素
  • make an item element 制作一个item元素
  • use replace() to replace a pie element with item 使用replace()item替换pie元素

replace(self, old_element, new_element) replace(self,old_element,new_element)

Replaces a subelement with the element passed as second argument. 用作为第二个参数传递的元素替换子元素。


from lxml import etree
from lxml.etree import XMLParser, Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""


tree = etree.fromstring(data, parser=XMLParser())
items = tree.find('.//items')
for index, pie in enumerate(items.xpath('.//pie'), start=1):
    item = Element('item', {'id': str(index)})
    items.replace(pie, item)
    item.append(pie)

print etree.tostring(tree, pretty_print=True)

prints: 打印:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>

Here's a solution using iterparse() . 这是使用iterparse()的解决方案。 The idea is to catch all tag "start" events, remember the parent ( items ) tag, then for every pie tag create an item tag and put the pie into it: 我们的想法是捕获所有标记“开始”事件,记住父itemsitems )标记,然后为每个pie标记创建一个item标记并将其放入其中:

from StringIO import StringIO
from lxml import etree
from lxml.etree import Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""

stream = StringIO(data)
context = etree.iterparse(stream, events=("start", ))

for action, elem in context:
    if elem.tag == 'items':
        items = elem
        index = 1
    elif elem.tag == 'pie':
        item = Element('item', {'id': str(index)})
        items.replace(elem, item)
        item.append(elem)
        index += 1

print etree.tostring(context.root)

prints: 打印:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>

I would suggest you to use an XSLT template, as it seems to match better for this task. 我建议你使用XSLT模板,因为它似乎更适合这项任务。 Initially XSLT is a little bit tricky until you get used to it, if all you want is to generate some output from an XML, then XSLT is a great tool. 最初XSLT有点棘手,直到你习惯它,如果你想要的只是从XML生成一些输出,那么XSLT是一个很好的工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM