简体   繁体   English

如何递归地遍历 xml 文件并访问子节点/元素并使用 Python 存储它们的数据?

[英]How do I iterate over a xml file recursively and access the child nodes/elements and store their data using Python?

I have a XML file like below.我有一个 XML 文件,如下所示。 Now I need to access port->name , port->wire->direction , port->wire->driver->defval .现在我需要访问port->nameport->wire->directionport->wire->driver->defval The XML file is very large in size. XML 文件非常大。

How do I approach this?我该如何处理?

<spirit:Bus> 
    <spirit:Ports>   
        <spirit:port>
            <spirit:name>ABCPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
                <spirit:direction>INPUT</spirit:direction>
                <spirit:driver>
                    <spirit:defaultValue>0</spirit:defaultValue>
                </spirit:driver>
            </spirit:wire>
        </spirit:port>
        <spirit:port>
            <spirit:name>PQRPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
                <spirit:direction>OUTPUT</spirit:direction>
            </spirit:wire>
        </spirit:port>        
    </spirit:ports>
</spirit:Bus>

I believe the best way to address this is with lxml and xpath:我相信解决这个问题的最好方法是使用 lxml 和 xpath:

from lxml import etree

#the xml below is somewhat different than the one in the question, because of a type and the declare namespaces

spirit = """<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns:spirit="http://example.com">
   <spirit:Bus>
      <spirit:Ports>
         <spirit:port>
            <spirit:name>ABCPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
               <spirit:direction>INPUT</spirit:direction>
               <spirit:driver>
                  <spirit:defaultValue>0</spirit:defaultValue>
               </spirit:driver>
            </spirit:wire>
         </spirit:port>
         <spirit:port>
            <spirit:name>PQRPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
               <spirit:direction>OUTPUT</spirit:direction>
            </spirit:wire>
         </spirit:port>
      </spirit:Ports>
   </spirit:Bus>
</doc>
"""

doc = etree.XML(spirit.encode('utf-8'))
ports = doc.xpath('//*[local-name()="port"]')
for port in ports:
    try:
        print("Port-",port.xpath('.//*[local-name()="name"]')[0].text)
        print("Direction",port.xpath('.//*[local-name()="direction"]')[0].text)
        print("Default value",port.xpath('.//*[local-name()="defaultValue"]')[0].text)
    except:
        continue

Output: Output:

Port- ABCPORT
Direction INPUT
Default value 0
Port- PQRPORT
Direction OUTPUT

To have properly formatted XML, I added the namespace to your sample:为了正确格式化 XML,我将命名空间添加到您的示例中:

<spirit:Bus xmlns:spirit="http://dummy.com">
    ...
</spirit:Bus>

but Bus is still the root node, as in your sample.但是Bus仍然是根节点,就像您的示例一样。 Of course, you can change the given URL to whatever you wish.当然,您可以将给定的 URL 更改为您想要的任何内容。

To do your task solely in ElementTree you can use the following code:要仅在ElementTree中完成任务,您可以使用以下代码:

import xml.etree.ElementTree as et

tree = et.parse('Input.xml')
root = tree.getroot()
ns = {'spirit': 'http://dummy.com'}
for nd in root.findall('spirit:Ports/spirit:port', ns):
    print(nd.tag.split('}')[1], nd.findtext('spirit:name', namespaces=ns),
        nd.findtext('spirit:wire/spirit:direction', namespaces=ns),
        nd.findtext('spirit:wire/spirit:driver/spirit:defaultValue', namespaces=ns))

Note that your XML contains a namespace specification, so you have also to specify it in the code.请注意,您的 XML 包含命名空间规范,因此您还必须在代码中指定它。

My code shows also how to get the local name of a node (without the namespace).我的代码还显示了如何获取节点的本地名称(没有命名空间)。

The result, for your sample is:结果,对于您的样本是:

port ABCPORT INPUT 0
port PQRPORT OUTPUT None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM