简体   繁体   English

解析XML: Python ElementTree,查找元素及其父元素,同父元素不包含其他元素

[英]Parsing XML: Python ElementTree, find elements and its parent elements without other elements in same parent

I am using python's ElementTree library to parse an XML file which has the following structure.我正在使用 python 的 ElementTree 库来解析具有以下结构的 XML 文件。 I am trying to get the xml string corresponding to entity with id = 192 with all its parents (folders) but without other entities我正在尝试获取与 id = 192 的实体相对应的 xml 字符串及其所有父项(文件夹)但没有其他实体

   <catalog>
        <folder name="entities">
            <entity id="102">

            </entity>
            <folder name="newEntities">
                <entity id="192">

                </entity>

                <entity id="2982">

                </entity>
            </folder>
        </folder>
    </catalog>

The required result should be所需的结果应该是

    <catalog>
        <folder name="entities">
            <folder name="newEntities">
                <entity id="192">

                </entity>
            </folder>
        </folder>
    </catalog>

assuming the 1st xml string is stored in a variable called xml_string假设第一个 xml 字符串存储在名为 xml_string 的变量中

tree = ET.fromstring(xmlstring)
id = 192
required_element = tree.find(".//entity[@id='" + id + "']")

This gets the xml element for the required entity but not the parent folders, any quick solution fix for this?这将获得所需实体的 xml 元素,而不是父文件夹,对此有什么快速解决方案吗?

The challenge here is to bypass the fact that ET has no parent information.这里的挑战是绕过 ET 没有父信息的事实。 The solution is to use parent_map解决方案是使用parent_map

import copy
import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom

xml = '''<catalog>
        <folder name="entities">
            <entity id="102">

            </entity>
            <folder name="newEntities">
                <entity id="192">

                </entity>

                <entity id="2982">

                </entity>
            </folder>
        </folder>
    </catalog>'''


def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

root = ET.fromstring(xml)
parent_map = {c: p for p in root.iter() for c in p}
_id = 192
required_element = root.find(".//entity[@id='" + str(_id) + "']")
_path = [copy.deepcopy(required_element)]
while True:
    parent = parent_map.get(required_element)
    if parent:
        _path.append(copy.deepcopy(parent))
        required_element = parent
    else:
        break
idx = len(_path) - 1
while idx >= 1:
    _path[idx].clear()
    _path[idx].append(_path[idx-1])
    idx -= 1

print(prettify(_path[-1]))

output output

<?xml version="1.0" ?>
<catalog>
    <folder>
        <folder>
            <entity id="192">

                </entity>



        </folder>
    </folder>
</catalog>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM