简体   繁体   English

如何使用 Python ElementTree 从 XML 文件中的不同子级元素中提取相关属性

[英]how can I extract related attributes from different child level elements from XML file with Python ElementTree

I am new to Python and somewhat new to XML parsing and am struggling to find the right code algorithm to extract the data I need whilst maintaining the relationship between the child element\attributes.我是 Python 的新手,对 XML 解析有些陌生,我正在努力寻找正确的代码算法来提取我需要的数据,同时保持子元素\属性之间的关系。

The XML is from an Audio Recording software application. XML 来自音频录制软件应用程序。 It defines configuration for multiple aspects of the product (so is large).它为产品的多个方面定义了配置(所以很大)。 I am wanting to extract some config items from a small part of the file for external use.我想从文件的一小部分中提取一些配置项以供外部使用。

The parent path to the below sample is: ./Presets/rootObjects/root/list/item[]以下示例的父路径是:./Presets/rootObjects/root/list/item[]

<item>
   <string name="ID" value="InBusConfig" wide="true"/>
   <string name="Name" value="InBusConfig" wide="true"/>
   <list name="Items" type="obj">
      <obj class="FPreset" ID="1511747008">
         <string name="Name" value="Drum &amp; Bass Beds" wide="true"/>
         <member name="Object">
            <list name="Busses" type="list">
               <item>
                  <string ***name="BusName" value="Kick In"*** wide="true"/>
                  <int name="SpeakerArr" value="0"/>
                  <list name="Connections" type="list">
                     <item>
                        <string ***name="PortId" value="I|Focusrite USB ASIO|Input 3"*** wide="true"/>
                        <int name="Speaker" value="0"/>
                     </item>
                  </list>
               </item>
...

               <item>
                  <string name="BusName" value="Live Room" wide="true"/>
                  <int name="SpeakerArr" value="0"/>
                  <list name="Connections" type="list">
                     <item>
                        <string name="PortId" value="I|Focusrite USB ASIO|Digital 10" wide="true"/>
                        <int name="Speaker" value="0"/>
                     </item>
                  </list>
               </item>
            </list>
            <int name="Default Bus Index" value="0"/>
         </member>
         <int name="Unrenamed" value="1"/>
      </obj>
   </list>
</item>

The items I need to extract (and keep related) are the "BusName" attribute and the respective "PortId" attribute which is a child of the "list" element named "connections".我需要提取(并保持相关)的项目是“BusName”属性和相应的“PortId”属性,它是名为“connections”的“list”元素的子元素。

For the 13 items that exist in my test configuration I want to output this data as a csv (or JSON) file for use in another tool.对于我的测试配置中存在的 13 个项目,我想将 output 此数据作为 csv(或 JSON)文件用于另一个工具。

The format I want this output in would ideally be similar to: (I have replaced the pipe char with a "-" as it was messing with the table formatting)我想要这个 output 的格式在理想情况下类似于:(我已经用“-”替换了 pipe 字符,因为它弄乱了表格格式)

BusName总线名称 PortId端口号
Kick In起作用 I - Focusrite USB ASIO - Input 3 I - Focusrite USB ASIO - 输入 3
Live Room直播间 I - Focusrite USB ASIO - Digital 10 I - Focusrite USB ASIO - 数字 10

I have not provided any py code as nothing I have tried so far is close enough to ask a specific question.我没有提供任何 py 代码,因为到目前为止我尝试过的任何东西都不足以提出一个特定的问题。 So I am requesting help of a more general nature as to how to approach this problem.因此,我请求有关如何解决此问题的更一般性的帮助。

I'd be happy with an algorithm, psuedocode or even some python specific code/functions etc.我会对算法、伪代码甚至一些 python 特定代码/功能等感到满意。

Any direction really would be more than I have right now.任何方向真的会比我现在拥有的更多。

Thanks in advance.提前致谢。

I recommend not using the built-in ElementTree, but rather work with a package like lxml or BeautifulSoup4 .我建议不要使用内置的 ElementTree,而是使用像lxmlBeautifulSoup4这样的 package 。

Edit: This is purely a matter of preference.编辑:这纯粹是一个偏好问题。 As pointed out correctly by @barny, the exact same thing can be achieved with xml.etree.ElementTree .正如@barny 正确指出的那样,使用xml.etree.ElementTree可以实现完全相同的事情。

Here's an attempt with lxml :这是lxml的尝试:

from lxml import etree

data = """
<item>
   <string name="ID" value="InBusConfig" wide="true"/>
   <string name="Name" value="InBusConfig" wide="true"/>
   <list name="Items" type="obj">
      <obj class="FPreset" ID="1511747008">
         <string name="Name" value="Drum &amp; Bass Beds" wide="true"/>
         <member name="Object">
            <list name="Busses" type="list">
               <item>
                  <string name="BusName" value="Kick In" wide="true"/>
                  <int name="SpeakerArr" value="0"/>
                  <list name="Connections" type="list">
                     <item>
                        <string name="PortId" value="I|Focusrite USB ASIO|Input 3" wide="true"/>
                        <int name="Speaker" value="0"/>
                     </item>
                  </list>
               </item>
               <item>
                  <string name="BusName" value="Live Room" wide="true"/>
                  <int name="SpeakerArr" value="0"/>
                  <list name="Connections" type="list">
                     <item>
                        <string name="PortId" value="I|Focusrite USB ASIO|Digital 10" wide="true"/>
                        <int name="Speaker" value="0"/>
                     </item>
                  </list>
               </item>
            </list>
            <int name="Default Bus Index" value="0"/>
         </member>
         <int name="Unrenamed" value="1"/>
      </obj>
   </list>
</item>

"""

root = etree.fromstring(data)

buses = root.xpath('//item[string/@name="BusName"]')

for bus in buses:
    bus_name = bus.find('string').get('value')
    port_id = bus.xpath('list/item/string/@value')[0]
    pair = (bus_name, port_id,)
    print(pair)

The general idea here is that it uses xpath to find a <item> , where there is a <string name="BusName"> .这里的总体思路是,它使用 xpath 来查找<item> ,其中有一个<string name="BusName">
From that item it takes:从该项目需要:

  • the value attribute of the <string> element. <string>元素的value属性。
  • the first match (because xpath returns a list) of the value attribute of the list/item/string child. list/item/string子项的value属性的第一个匹配项(因为 xpath 返回一个列表)。
    (If there were more items in this list, you'd want to adjust that here) (如果此列表中有更多项目,您需要在此处进行调整)

Note: I'm just creating a tuple called pair , but obviously you can also store these variables in a dataframe ( pandas ), or wirte it directly to json or csv (see eg here: https://realpython.com/python-csv/ ). Note: I'm just creating a tuple called pair , but obviously you can also store these variables in a dataframe ( pandas ), or wirte it directly to json or csv (see eg here: https://realpython.com/python- .csv/ )。

Not sure why @balduin recommends lxml/avoids ElementTree - it works perfectly well.不知道为什么@balduin 推荐 lxml/avoids ElementTree - 它工作得很好。

ElementTree has limited xpath support compared to lxml but it's perfectly sufficient for many tasks and avoiding need for an external package is often helpful.与 lxml 相比,ElementTree 对 xpath 的支持有限,但它对于许多任务来说已经足够了,并且避免需要外部 package 通常很有帮助。 Docs are here: https://docs.python.org/3/library/xml.etree.elementtree.html?highlight=findall#xml.etree.ElementTree.Element.findall文档在这里: https://docs.python.org/3/library/xml.etree.elementtree.html?highlight=findall#xml

There are some xpath examples further up that page该页面上方有一些 xpath 示例

import lxml.etree as ETL
import xml.etree.ElementTree as ET

data = """
<item>
   <string name="ID" value="InBusConfig" wide="true"/>
   <string name="Name" value="InBusConfig" wide="true"/>
   <list name="Items" type="obj">
      <obj class="FPreset" ID="1511747008">
         <string name="Name" value="Drum &amp; Bass Beds" wide="true"/>
         <member name="Object">
            <list name="Busses" type="list">
               <item>
                  <string name="BusName" value="Kick In" wide="true"/>
                  <int name="SpeakerArr" value="0"/>
                  <list name="Connections" type="list">
                     <item>
                        <string name="PortId" value="I|Focusrite USB ASIO|Input 3" wide="true"/>
                        <int name="Speaker" value="0"/>
                     </item>
                  </list>
               </item>
               <item>
                  <string name="BusName" value="Live Room" wide="true"/>
                  <int name="SpeakerArr" value="0"/>
                  <list name="Connections" type="list">
                     <item>
                        <string name="PortId" value="I|Focusrite USB ASIO|Digital 10" wide="true"/>
                        <int name="Speaker" value="0"/>
                     </item>
                  </list>
               </item>
            </list>
            <int name="Default Bus Index" value="0"/>
         </member>
         <int name="Unrenamed" value="1"/>
      </obj>
   </list>
</item>

"""

# lxml
root = ETL.fromstring(data)

buses = root.xpath('//item[string/@name="BusName"]')

for bus in buses:
    bus_name = bus.find('string').get('value')
    port_id = bus.xpath('list/item/string/@value')[0]
    pair = (bus_name, port_id,)
    print(pair)

# ElementTree
root1 = ET.fromstring(data)

for el in root1.findall('.//item/string[@name="BusName"]/..'):
    bus_name = el.find('./string').get('value')
    port_id=el.find("./list/item/string").get('value')
    pair=(bus_name,port_id)
    print(pair)

Output lxml: Output lxml:

('Kick In', 'I|Focusrite USB ASIO|Input 3')
('Live Room', 'I|Focusrite USB ASIO|Digital 10')

Output ElementTree: Output 元素树:

('Kick In', 'I|Focusrite USB ASIO|Input 3')
('Live Room', 'I|Focusrite USB ASIO|Digital 10')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM