[英]Parsing XML: Python ElementTree, find elements and its parent elements without other elements in same parent
I am using python's ElementTree library to parse an XML file which has the following structure.我正在使用 python 的 ElementTree 库来解析具有以下结构的 XML 文件。 I am trying to get the xml string corresponding to entity with id = 192 with all its parents (folders) but without other entities
我正在尝试获取与 id = 192 的实体相对应的 xml 字符串及其所有父项(文件夹)但没有其他实体
<catalog>
<folder name="entities">
<entity id="102">
</entity>
<folder name="newEntities">
<entity id="192">
</entity>
<entity id="2982">
</entity>
</folder>
</folder>
</catalog>
The required result should be所需的结果应该是
<catalog>
<folder name="entities">
<folder name="newEntities">
<entity id="192">
</entity>
</folder>
</folder>
</catalog>
assuming the 1st xml string is stored in a variable called xml_string假设第一个 xml 字符串存储在名为 xml_string 的变量中
tree = ET.fromstring(xmlstring)
id = 192
required_element = tree.find(".//entity[@id='" + id + "']")
This gets the xml element for the required entity but not the parent folders, any quick solution fix for this?这将获得所需实体的 xml 元素,而不是父文件夹,对此有什么快速解决方案吗?
The challenge here is to bypass the fact that ET has no parent information.这里的挑战是绕过 ET 没有父信息的事实。 The solution is to use
parent_map
解决方案是使用
parent_map
import copy
import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom
xml = '''<catalog>
<folder name="entities">
<entity id="102">
</entity>
<folder name="newEntities">
<entity id="192">
</entity>
<entity id="2982">
</entity>
</folder>
</folder>
</catalog>'''
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ET.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent="\t")
root = ET.fromstring(xml)
parent_map = {c: p for p in root.iter() for c in p}
_id = 192
required_element = root.find(".//entity[@id='" + str(_id) + "']")
_path = [copy.deepcopy(required_element)]
while True:
parent = parent_map.get(required_element)
if parent:
_path.append(copy.deepcopy(parent))
required_element = parent
else:
break
idx = len(_path) - 1
while idx >= 1:
_path[idx].clear()
_path[idx].append(_path[idx-1])
idx -= 1
print(prettify(_path[-1]))
output output
<?xml version="1.0" ?>
<catalog>
<folder>
<folder>
<entity id="192">
</entity>
</folder>
</folder>
</catalog>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.