如何递归地遍历 xml 文件并访问子节点/元素并使用 Python 存储它们的数据？

Question

I have a XML file like below.我有一个 XML 文件，如下所示。 Now I need to access port->name , port->wire->direction , port->wire->driver->defval .现在我需要访问port->name 、 port->wire->direction 、 port->wire->driver->defval 。 The XML file is very large in size. XML 文件非常大。

How do I approach this?我该如何处理？

<spirit:Bus> 
    <spirit:Ports>   
        <spirit:port>
            <spirit:name>ABCPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
                <spirit:direction>INPUT</spirit:direction>
                <spirit:driver>
                    <spirit:defaultValue>0</spirit:defaultValue>
                </spirit:driver>
            </spirit:wire>
        </spirit:port>
        <spirit:port>
            <spirit:name>PQRPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
                <spirit:direction>OUTPUT</spirit:direction>
            </spirit:wire>
        </spirit:port>        
    </spirit:ports>
</spirit:Bus>

Answer 1

I believe the best way to address this is with lxml and xpath:我相信解决这个问题的最好方法是使用 lxml 和 xpath：

from lxml import etree

#the xml below is somewhat different than the one in the question, because of a type and the declare namespaces

spirit = """<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns:spirit="http://example.com">
   <spirit:Bus>
      <spirit:Ports>
         <spirit:port>
            <spirit:name>ABCPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
               <spirit:direction>INPUT</spirit:direction>
               <spirit:driver>
                  <spirit:defaultValue>0</spirit:defaultValue>
               </spirit:driver>
            </spirit:wire>
         </spirit:port>
         <spirit:port>
            <spirit:name>PQRPORT</spirit:name>
            <spirit:description>SOME DESCRIPTION</spirit:description>
            <spirit:wire>
               <spirit:direction>OUTPUT</spirit:direction>
            </spirit:wire>
         </spirit:port>
      </spirit:Ports>
   </spirit:Bus>
</doc>
"""

doc = etree.XML(spirit.encode('utf-8'))
ports = doc.xpath('//*[local-name()="port"]')
for port in ports:
    try:
        print("Port-",port.xpath('.//*[local-name()="name"]')[0].text)
        print("Direction",port.xpath('.//*[local-name()="direction"]')[0].text)
        print("Default value",port.xpath('.//*[local-name()="defaultValue"]')[0].text)
    except:
        continue

Output: Output：

Port- ABCPORT
Direction INPUT
Default value 0
Port- PQRPORT
Direction OUTPUT

Answer 2

To have properly formatted XML, I added the namespace to your sample:为了正确格式化 XML，我将命名空间添加到您的示例中：

<spirit:Bus xmlns:spirit="http://dummy.com">
    ...
</spirit:Bus>

but Bus is still the root node, as in your sample.但是Bus仍然是根节点，就像您的示例一样。 Of course, you can change the given URL to whatever you wish.当然，您可以将给定的 URL 更改为您想要的任何内容。

To do your task solely in ElementTree you can use the following code:要仅在ElementTree中完成任务，您可以使用以下代码：

import xml.etree.ElementTree as et

tree = et.parse('Input.xml')
root = tree.getroot()
ns = {'spirit': 'http://dummy.com'}
for nd in root.findall('spirit:Ports/spirit:port', ns):
    print(nd.tag.split('}')[1], nd.findtext('spirit:name', namespaces=ns),
        nd.findtext('spirit:wire/spirit:direction', namespaces=ns),
        nd.findtext('spirit:wire/spirit:driver/spirit:defaultValue', namespaces=ns))

Note that your XML contains a namespace specification, so you have also to specify it in the code.请注意，您的 XML 包含命名空间规范，因此您还必须在代码中指定它。

My code shows also how to get the local name of a node (without the namespace).我的代码还显示了如何获取节点的本地名称（没有命名空间）。

The result, for your sample is:结果，对于您的样本是：

port ABCPORT INPUT 0
port PQRPORT OUTPUT None

如何递归地遍历 xml 文件并访问子节点/元素并使用 Python 存储它们的数据？

问题描述

2 个解决方案

解决方案1
0 2020-04-20 11:14:10

解决方案2
0 2020-04-21 10:40:14

如何递归地遍历 xml 文件并访问子节点/元素并使用 Python 存储它们的数据？

问题描述

2 个解决方案

解决方案1 0 2020-04-20 11:14:10

解决方案2 0 2020-04-21 10:40:14

解决方案1
0 2020-04-20 11:14:10

解决方案2
0 2020-04-21 10:40:14