简体   繁体   English

如何使用 Python 最好地迭代(广度优先)lxml etree

[英]How to best iterate (breadth-first) over an lxml etree using Python

I'm trying to wrap my head around lxml (new to this) and how I can use it to do what I want to do.我正在尝试围绕 lxml(对此是新手)以及如何使用它来做我想做的事情。 I've got an well-formed and valid XML file我有一个格式正确且有效的 XML 文件

<root>
  <a>
    <b>Text</b>
    <c>More text</c>
  </a>
  <!-- some comment -->
  <a>
    <d id="10" />
  </a>
</root>

something like this.像这样的东西。 Now I'd like to visit the children breadth-first, and the best I can come up with is something like this:现在我想以广度优先的方式访问孩子们,我能想到的最好的办法是这样的:

for e in xml.getroot()[0].itersiblings() :
    print(e.tag, e.attrib)

and then take it from there.然后从那里拿走。 However, this gives me all elements including comments但是,这给了我所有元素,包括评论

a {}
<built-in function Comment> {}
a {}

How do I skip over comments?如何跳过评论? Is there a better way to iterate over the direct children of a node?有没有更好的方法来迭代节点的直接子节点?

In general, what are the recommendations to parse an XML tree vs. event-driven pull-parsing using, say, iterparse() ?一般来说,解析 XML 树与使用iterparse()等事件驱动的拉式解析的建议是什么?

This works for your case 这适用于您的情况

for child in doc.getroot().iterchildren("*"):
    print(child.tag, child.attrib)

This question was asked over 9 years ago, but I just ran into this issue myself, and I solved it with the following这个问题是 9 年前提出的,但我自己也遇到过这个问题,我用以下方法解决了它

import xml.etree.ElementTree as ET

xmlfile = ET.parse("file.xml")
root = xmlfile.getroot()

visit = [root]
while len(visit):
  curr = visit.pop(0)
  print(curr.tag, curr.attrib, curr.text)
  visit += list(curr)

list(node) will give a list of all the immediate children of that node. list(node)将给出该节点的所有直接子节点的列表。 So by adding those children to a stack and just repeating that process with whatever is on the top of the stack (popping it off at the same time), we should end up with a standard breadth-first search.因此,通过将这些孩子添加到堆栈中,然后对堆栈顶部的任何内容重复该过程(同时将其弹出),我们应该以标准的广度优先搜索结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM