如何使用 ElementTree 在 Python 中遞歸迭代 XML 標記？

Question

我正在嘗試使用 ElementTree 遍歷樹中的所有節點。

我做類似的事情：

tree = ET.parse("/tmp/test.xml")

root = tree.getroot()

for child in root:
    ### do something with child

問題是 child 是一個 Element 對象而不是ElementTree對象，所以我不能進一步研究它並遞歸迭代它的元素。 有沒有辦法對“根”進行不同的迭代，以便它迭代樹中的頂級節點（直接子節點）並返回與根本身相同的類？

Answer 1

要遍歷所有節點，請使用ElementTree上的iter方法，而不是根元素。

根是一個元素，就像樹中的其他元素一樣，只有它自己的屬性和子元素的上下文。 ElementTree具有所有元素的上下文。

例如，給定這個 xml

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

您可以執行以下操作

>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
...     print elem
... 
<Element 'data' at 0x10b2d7b50>
<Element 'country' at 0x10b2d7b90>
<Element 'rank' at 0x10b2d7bd0>
<Element 'year' at 0x10b2d7c50>
<Element 'gdppc' at 0x10b2d7d10>
<Element 'neighbor' at 0x10b2d7e90>
<Element 'neighbor' at 0x10b2d7ed0>
<Element 'country' at 0x10b2d7f10>
<Element 'rank' at 0x10b2d7f50>
<Element 'year' at 0x10b2d7f90>
<Element 'gdppc' at 0x10b2d7fd0>
<Element 'neighbor' at 0x10b2db050>
<Element 'country' at 0x10b2db090>
<Element 'rank' at 0x10b2db0d0>
<Element 'year' at 0x10b2db110>
<Element 'gdppc' at 0x10b2db150>
<Element 'neighbor' at 0x10b2db190>
<Element 'neighbor' at 0x10b2db1d0>

Answer 2

添加到 Robert Christie 的回答中，可以通過將 Element 轉換為 ElementTree 來使用fromstring()遍歷所有節點：

import xml.etree.ElementTree as ET

e = ET.ElementTree(ET.fromstring(xml_string))
for elt in e.iter():
    print "%s: '%s'" % (elt.tag, elt.text)

Answer 3

您還可以像這樣訪問特定元素：

country= tree.findall('.//country')

然后循環range(len(country))並訪問

Answer 4

除了羅伯特克里斯蒂接受的答案之外，單獨打印值和標簽非常容易：

tree = ET.parse('test.xml')
for elem in tree.iter():
    print(elem.tag, elem.text)

Answer 5

雖然iter()非常好，但我需要一種在跟蹤嵌套級別的同時遍歷 xml 層次結構的方法，而iter()對此毫無幫助。 我想要像iterparse()這樣的東西，它在層次結構的每個級別發出開始和結束事件，但我已經有了 ElementTree 所以不希望不必要的步驟/開銷轉換為字符串並使用iterparse()重新解析要求。

很驚訝我找不到這個，我不得不自己寫：

def iterwalk(root, events=None, tags=None):
    """Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
    Returns an iterator providing (event, elem) pairs.
    Events are start and end
    events is a list of events to emit - defaults to ["start","end"]
    tags is a single tag or a list of tags to emit events for - if empty/None events are generated for all tags
    """
    # each stack entry consists of a list of the xml element and a second entry initially None
    # if the second entry is None a start is emitted and all children of current element are put into the second entry
    # if the second entry is a non-empty list the first item in it is popped and then a new stack entry is created
    # once the second entry is an empty list, and end is generated and then stack is popped
    stack = [[root,None]]
    tags = [] if tags is None else tags if type(tags) == list else [tags]
    events = events or ["start","end"]
    def iterator():
        while stack:
            elnow,children = stack[-1]
            if children is None:
                # this is the start of elnow so emit a start and put its children into the stack entry
                if ( not tags or elnow.tag in tags ) and "start" in events:
                    yield ("start",elnow)
                # put the children into the top stack entry
                stack[-1][1] = list(elnow)
            elif len(children)>0:
                # do a child and remove it
                thischild = children.pop(0)
                # and now create a new stack entry for this child
                stack.append([thischild,None])                
            else:
                # finished these children - emit the end
                if ( not tags or elnow.tag in tags ) and "end" in events:
                    yield ("end",elnow)
                stack.pop()
    return iterator

# myxml is my parsed XML which has nested Binding tags, I want to count the depth of nesting

# Now explore the structure
it = iterwalk( myxml, tags='Binding'))
level = 0
for event,el in it():
    if event == "start":
        level += 1
        
    print( f"{level} {el.tag=}" )
    
    if event == "end":
        level -= 1

使用堆棧，以便您可以在沿層次結構向下移動時發出開始事件，然后正確回溯。 堆棧中的最后一個條目最初是 [el, None] 因此發出 el 的啟動事件，第二個條目更新為 [el,children] ，每個孩子在進入時從孩子中刪除，直到最后一個孩子之后已經完成，條目是 [el,[]]，此時發出 el 的結束事件，並從堆棧中刪除頂部條目。

我用堆棧這樣做是因為我不喜歡調試遞歸代碼，而且無論如何我不確定如何編寫遞歸迭代器函數。

這是一個更容易理解的遞歸版本，但如果它不是那么簡單並且出現問題，將很難調試 - 我yield from

def iterwalk1(root, events=None, tags=None):
    """Recuirsive version - Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
    Returns an iterator providing (event, elem) pairs.
    Events are start and end
    events is a list of events to emit - defaults to ["start","end"]
    tags is a single tag or a list of tags to emit events for - if None or empty list then events are generated for all tags
    """
    tags = [] if tags is None else tags if type(tags) == list else [tags]
    events = events or ["start","end"]
    
    def recursiveiterator(el,suppressyield=False):
        if not suppressyield and ( not tags or el.tag in tags ) and "start" in events:
            yield ("start",el)
        for child in list(el):
            yield from recursiveiterator(child)
        if not suppressyield and  ( not tags or el.tag in tags ) and "end" in events:
            yield ("end",el)
            
    def iterator():
        yield from recursiveiterator( root, suppressyield=True )
        
    return iterator

Answer 6

xml 到 dict 的出色解決方案：請參閱https://stackoverflow.com/a/68082847/3505444

def etree_to_dict(t):
    if type(t) is ET.ElementTree: return etree_to_dict(t.getroot())
    return {
        **t.attrib,
        'text': t.text,
        **{e.tag: etree_to_dict(e) for e in t}
    }

和：

def nested_dict_pairs_iterator(dict_obj):
    ''' This function accepts a nested dictionary as argument
        and iterate over all values of nested dictionaries
    '''
    # Iterate over all key-value pairs of dict argument
    for key, value in dict_obj.items():
        # Check if value is of dict type
        if isinstance(value, dict):
            # If value is dict then iterate over all its values
            for pair in  nested_dict_pairs_iterator(value):
                yield (key, *pair)
        else:
            # If value is not dict type then yield the value
            yield (key, value)

最后：

root_dict = etree_to_dict(myet.root)
for pair in nested_dict_pairs_iterator(root_dict):
    print(pair)

如何使用 ElementTree 在 Python 中遞歸迭代 XML 標記？

問題描述

6 個解決方案

解決方案1
49 已采納 2014-01-12 12:21:25

解決方案2
17 2016-10-07 03:58:22

解決方案3
13 2016-04-18 03:12:29

解決方案4
13 2018-12-18 02:52:11

解決方案5
1 2021-11-25 14:20:45

解決方案6
0 2022-07-14 13:50:10

如何使用 ElementTree 在 Python 中遞歸迭代 XML 標記？

問題描述

6 個解決方案

解決方案1 49 已采納 2014-01-12 12:21:25

解決方案2 17 2016-10-07 03:58:22

解決方案3 13 2016-04-18 03:12:29

解決方案4 13 2018-12-18 02:52:11

解決方案5 1 2021-11-25 14:20:45

解決方案6 0 2022-07-14 13:50:10

解決方案1
49 已采納 2014-01-12 12:21:25

解決方案2
17 2016-10-07 03:58:22

解決方案3
13 2016-04-18 03:12:29

解決方案4
13 2018-12-18 02:52:11

解決方案5
1 2021-11-25 14:20:45

解決方案6
0 2022-07-14 13:50:10