[英]How to best iterate (breadth-first) over an lxml etree using Python
I'm trying to wrap my head around lxml (new to this) and how I can use it to do what I want to do.我正在尝试围绕 lxml(对此是新手)以及如何使用它来做我想做的事情。 I've got an well-formed and valid XML file
我有一个格式正确且有效的 XML 文件
<root>
<a>
<b>Text</b>
<c>More text</c>
</a>
<!-- some comment -->
<a>
<d id="10" />
</a>
</root>
something like this.像这样的东西。 Now I'd like to visit the children breadth-first, and the best I can come up with is something like this:
现在我想以广度优先的方式访问孩子们,我能想到的最好的办法是这样的:
for e in xml.getroot()[0].itersiblings() :
print(e.tag, e.attrib)
and then take it from there.然后从那里拿走。 However, this gives me all elements including comments
但是,这给了我所有元素,包括评论
a {}
<built-in function Comment> {}
a {}
How do I skip over comments?如何跳过评论? Is there a better way to iterate over the direct children of a node?
有没有更好的方法来迭代节点的直接子节点?
In general, what are the recommendations to parse an XML tree vs. event-driven pull-parsing using, say, iterparse()
?一般来说,解析 XML 树与使用
iterparse()
等事件驱动的拉式解析的建议是什么?
This works for your case 这适用于您的情况
for child in doc.getroot().iterchildren("*"):
print(child.tag, child.attrib)
This question was asked over 9 years ago, but I just ran into this issue myself, and I solved it with the following这个问题是 9 年前提出的,但我自己也遇到过这个问题,我用以下方法解决了它
import xml.etree.ElementTree as ET
xmlfile = ET.parse("file.xml")
root = xmlfile.getroot()
visit = [root]
while len(visit):
curr = visit.pop(0)
print(curr.tag, curr.attrib, curr.text)
visit += list(curr)
list(node)
will give a list of all the immediate children of that node. list(node)
将给出该节点的所有直接子节点的列表。 So by adding those children to a stack and just repeating that process with whatever is on the top of the stack (popping it off at the same time), we should end up with a standard breadth-first search.因此,通过将这些孩子添加到堆栈中,然后对堆栈顶部的任何内容重复该过程(同时将其弹出),我们应该以标准的广度优先搜索结束。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.