![](/img/trans.png)
[英]How to iterate over XML tags in Python using ElementTree & save to CSV
[英]how to recursively iterate over XML tags in Python using ElementTree?
我正在嘗試使用 ElementTree 遍歷樹中的所有節點。
我做類似的事情:
tree = ET.parse("/tmp/test.xml")
root = tree.getroot()
for child in root:
### do something with child
問題是 child 是一個 Element 對象而不是ElementTree
對象,所以我不能進一步研究它並遞歸迭代它的元素。 有沒有辦法對“根”進行不同的迭代,以便它迭代樹中的頂級節點(直接子節點)並返回與根本身相同的類?
要遍歷所有節點,請使用ElementTree
上的iter
方法,而不是根元素。
根是一個元素,就像樹中的其他元素一樣,只有它自己的屬性和子元素的上下文。 ElementTree
具有所有元素的上下文。
例如,給定這個 xml
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
您可以執行以下操作
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
... print elem
...
<Element 'data' at 0x10b2d7b50>
<Element 'country' at 0x10b2d7b90>
<Element 'rank' at 0x10b2d7bd0>
<Element 'year' at 0x10b2d7c50>
<Element 'gdppc' at 0x10b2d7d10>
<Element 'neighbor' at 0x10b2d7e90>
<Element 'neighbor' at 0x10b2d7ed0>
<Element 'country' at 0x10b2d7f10>
<Element 'rank' at 0x10b2d7f50>
<Element 'year' at 0x10b2d7f90>
<Element 'gdppc' at 0x10b2d7fd0>
<Element 'neighbor' at 0x10b2db050>
<Element 'country' at 0x10b2db090>
<Element 'rank' at 0x10b2db0d0>
<Element 'year' at 0x10b2db110>
<Element 'gdppc' at 0x10b2db150>
<Element 'neighbor' at 0x10b2db190>
<Element 'neighbor' at 0x10b2db1d0>
添加到 Robert Christie 的回答中,可以通過將 Element 轉換為 ElementTree 來使用fromstring()
遍歷所有節點:
import xml.etree.ElementTree as ET
e = ET.ElementTree(ET.fromstring(xml_string))
for elt in e.iter():
print "%s: '%s'" % (elt.tag, elt.text)
您還可以像這樣訪問特定元素:
country= tree.findall('.//country')
然后循環range(len(country))
並訪問
除了羅伯特克里斯蒂接受的答案之外,單獨打印值和標簽非常容易:
tree = ET.parse('test.xml')
for elem in tree.iter():
print(elem.tag, elem.text)
雖然iter()
非常好,但我需要一種在跟蹤嵌套級別的同時遍歷 xml 層次結構的方法,而iter()
對此毫無幫助。 我想要像iterparse()
這樣的東西,它在層次結構的每個級別發出開始和結束事件,但我已經有了 ElementTree 所以不希望不必要的步驟/開銷轉換為字符串並使用iterparse()
重新解析要求。
很驚訝我找不到這個,我不得不自己寫:
def iterwalk(root, events=None, tags=None):
"""Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
Returns an iterator providing (event, elem) pairs.
Events are start and end
events is a list of events to emit - defaults to ["start","end"]
tags is a single tag or a list of tags to emit events for - if empty/None events are generated for all tags
"""
# each stack entry consists of a list of the xml element and a second entry initially None
# if the second entry is None a start is emitted and all children of current element are put into the second entry
# if the second entry is a non-empty list the first item in it is popped and then a new stack entry is created
# once the second entry is an empty list, and end is generated and then stack is popped
stack = [[root,None]]
tags = [] if tags is None else tags if type(tags) == list else [tags]
events = events or ["start","end"]
def iterator():
while stack:
elnow,children = stack[-1]
if children is None:
# this is the start of elnow so emit a start and put its children into the stack entry
if ( not tags or elnow.tag in tags ) and "start" in events:
yield ("start",elnow)
# put the children into the top stack entry
stack[-1][1] = list(elnow)
elif len(children)>0:
# do a child and remove it
thischild = children.pop(0)
# and now create a new stack entry for this child
stack.append([thischild,None])
else:
# finished these children - emit the end
if ( not tags or elnow.tag in tags ) and "end" in events:
yield ("end",elnow)
stack.pop()
return iterator
# myxml is my parsed XML which has nested Binding tags, I want to count the depth of nesting
# Now explore the structure
it = iterwalk( myxml, tags='Binding'))
level = 0
for event,el in it():
if event == "start":
level += 1
print( f"{level} {el.tag=}" )
if event == "end":
level -= 1
使用堆棧,以便您可以在沿層次結構向下移動時發出開始事件,然后正確回溯。 堆棧中的最后一個條目最初是 [el, None] 因此發出 el 的啟動事件,第二個條目更新為 [el,children] ,每個孩子在進入時從孩子中刪除,直到最后一個孩子之后已經完成,條目是 [el,[]],此時發出 el 的結束事件,並從堆棧中刪除頂部條目。
我用堆棧這樣做是因為我不喜歡調試遞歸代碼,而且無論如何我不確定如何編寫遞歸迭代器函數。
這是一個更容易理解的遞歸版本,但如果它不是那么簡單並且出現問題,將很難調試 - 我yield from
def iterwalk1(root, events=None, tags=None):
"""Recuirsive version - Incrementally walks XML structure (like iterparse but for an existing ElementTree structure)
Returns an iterator providing (event, elem) pairs.
Events are start and end
events is a list of events to emit - defaults to ["start","end"]
tags is a single tag or a list of tags to emit events for - if None or empty list then events are generated for all tags
"""
tags = [] if tags is None else tags if type(tags) == list else [tags]
events = events or ["start","end"]
def recursiveiterator(el,suppressyield=False):
if not suppressyield and ( not tags or el.tag in tags ) and "start" in events:
yield ("start",el)
for child in list(el):
yield from recursiveiterator(child)
if not suppressyield and ( not tags or el.tag in tags ) and "end" in events:
yield ("end",el)
def iterator():
yield from recursiveiterator( root, suppressyield=True )
return iterator
xml 到 dict 的出色解決方案:請參閱https://stackoverflow.com/a/68082847/3505444
def etree_to_dict(t):
if type(t) is ET.ElementTree: return etree_to_dict(t.getroot())
return {
**t.attrib,
'text': t.text,
**{e.tag: etree_to_dict(e) for e in t}
}
和 :
def nested_dict_pairs_iterator(dict_obj):
''' This function accepts a nested dictionary as argument
and iterate over all values of nested dictionaries
'''
# Iterate over all key-value pairs of dict argument
for key, value in dict_obj.items():
# Check if value is of dict type
if isinstance(value, dict):
# If value is dict then iterate over all its values
for pair in nested_dict_pairs_iterator(value):
yield (key, *pair)
else:
# If value is not dict type then yield the value
yield (key, value)
最后 :
root_dict = etree_to_dict(myet.root)
for pair in nested_dict_pairs_iterator(root_dict):
print(pair)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.