Python xml遍历n级标记

Question

I want to iterate through a specific stage of my tags. 我想迭代我的标签的特定阶段。

For example I would like to iterate through the toplevel objects: 例如，我想迭代顶层对象：

<stage1tag>
<child1tag>bla</child1tag>
<child2tag>blabla</child2tag>
<child3tag><stage2tag>heyho</stage2tag></child3tag></stage1tag>
<stage1tag2>
<stage1tag>
<child1tag>…
...

I only want to iterate through the tags at stage 1 (stage1tag and stage1tag2) In my real xml they are not called child...tag and stage...tag, this is only for a better readability. 我只想在第1阶段迭代标签（stage1tag和stage1tag2）在我的真实xml中，它们不被称为child ... tag和stage ...标签，这只是为了更好的可读性。 How can i get the toplevel tags? 我如何获得顶级标签？ I am searching for something like 我正在寻找类似的东西

elems = mytree.getlevel(0) #toplevel
for child in elems.iter():
    #do something with the childs...

Answer 1

This is one possible solution to this problem, I have not extensively tested it but it is meant to give you an idea on how to approach this kind of problems. 这是这个问题的一种可能的解决方案，我没有对它进行过广泛的测试，但它是为了让您了解如何处理这类问题。

import re

txt = \
'''
<stage1tag>
<child1tag>bla</child1tag>
<child2tag>blabla</child2tag>
<child3tag><stage2tag>heyho</stage2tag></child3tag></stage1tag>
<stage1tag2>
<stage1tag>
<child1tag>
'''

#1: find tags
re1='(<[^>]+>)' # regex string
rg = re.compile(re1,re.IGNORECASE|re.DOTALL)
tags = rg.findall(txt)

#2: determine the level of each tag
lvl = 1 # starting lvl
for t in tags:
    if '</' not in t: #it's an open tag, go up one lvl
        k = t[1:-1]
        print k,':',lvl
        lvl += 1

    else: #it's a close tag, go one lvl down
        lvl -= 1

It prints out: 打印出来：

stage1tag : 1
child1tag : 2
child2tag : 2
child3tag : 2
stage2tag : 3
stage1tag2 : 1
stage1tag : 2
child1tag : 3

That is correct given your xlm. 鉴于你的xlm，这是正确的。

Answer 2

I assume you have a root element - otherwise the parser will choke with something like "XMLSyntaxError: Extra content at the end of the document". 我假设你有一个根元素 - 否则解析器会窒息“XMLSyntaxError：文档末尾的额外内容”。 If you lack a root element, just add one: 如果你缺少一个根元素，只需添加一个：

data = """<root>
<stage1tag id="1">
<child1tag>bla</child1tag>
<child2tag>blabla</child2tag>
<child3tag><stage2tag>heyho</stage2tag></child3tag>
</stage1tag>
<stage1tag id="2">
<child1tag>bla</child1tag>
<child2tag>blabla</child2tag>
<child3tag><stage2tag>heyho</stage2tag></child3tag>
</stage1tag>
</root>
"""

You can use lxml: 你可以使用lxml：

>>> import lxml.etree
>>> root = lxml.etree.fromstring(data)
>>> root.getchildren()
[<Element stage1tag at 0x3bf6530>, <Element stage1tag at 0x3bfb7d8>]

>>> for tag in root.getchildren():
        print(tag.attrib.get('id'))
1 
2

If your document lack a root element I don't think you can call it XML, you have something resembling XML (see Do you always have to have a root node with xml/xsd? ) 如果您的文档缺少根元素，我认为您不能将其称为XML，那么您有类似于XML的东西（请参阅您是否始终必须拥有xml / xsd的根节点？）

Python xml遍历n级标记

问题描述

2 个解决方案

解决方案1
0 2016-04-23 11:27:40

解决方案2
0 2016-04-23 12:05:12

Python xml遍历n级标记

问题描述

2 个解决方案

解决方案1 0 2016-04-23 11:27:40

解决方案2 0 2016-04-23 12:05:12

解决方案1
0 2016-04-23 11:27:40

解决方案2
0 2016-04-23 12:05:12