使用 for 循环解析深度嵌套的 xml 文件

Question

如何有效地从嵌套的 xml 中提取数据？ 通过有效，我的意思是例如使用 for 循环。 我需要使用新的数据结构吗？

解析 function：

import xml.etree.ElementTree as ET

it = ET.iterparse('OTA_AirSeatMapRS.xml')

# This for loop removes the namespaces
for _, el in it:
    _, _, el.tag = el.tag.rpartition('}')
root = it.root

# I am not able to select data with this loop
for x in element.find(Service):
    print(x)

这是 XML 文件的一部分：

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope
    xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
    xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <soapenv:Body>
        <ns:OTA_AirSeatMapRS Version="1"
            xmlns:ns="http://www.opentravel.org/OTA/2003/05/common/">
            <ns:Success/>
            <ns:SeatMapResponses>
                <ns:SeatMapResponse>
                    <ns:FlightSegmentInfo DepartureDateTime="2020-11-22T15:30:00" FlightNumber="1179">
                        <ns:DepartureAirport LocationCode="LAS"/>
                        <ns:ArrivalAirport LocationCode="IAH"/>
                        <ns:Equipment AirEquipType="739"/>
                    </ns:FlightSegmentInfo>
                    <ns:SeatMapDetails>
                        <ns:CabinClass Layout="AB EF" UpperDeckInd="false">
                            <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="1">
                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">
                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1A"/>
                                    <ns:Features>Window</ns:Features>
                                </ns:SeatInfo>

我的最终目标是使用解析后的数据存储在 JSON 中。

Answer 1

查看代码中的以下指令：

for x in element.find(Service):

您的代码示例中的第一个缺陷是：

服务是一个变量（不是字符串文字），
可能您将此变量初始化为某个字符串，但未能将此指令放入您的代码示例中。

另一个缺陷的来源是find找到与给定路径匹配的第一个元素，因此您不应该在循环中使用它。 也许您还应该检查find是否返回了一些 not- None内容，但这是另一个细节。

你得到空输出的第三个原因是print(x)实际上只打印了相关元素的文本。

所以有一个更一般的例子，运行：

Service = 'Summary'
x = root.find(f'.//{Service}')
print(f'{x.tag}, {x.text}, {x.attrib}')

第一条指令设置标签名称。

第二条指令调用find ，但请注意我在XPath 中添加了“.//” ，以查看源 XML 树的任何深度。

最后一条指令不仅打印找到的元素的文本，还打印标签名称和属性。

我得到的结果（对于您的输入 XML）是：

Summary, None, {'AvailableInd': 'false', 'InoperativeInd': 'false', 'OccupiedInd': 'false', 'SeatNumber': '1A'}

（ text只是None ，所以您在原始输出中看不到任何结果）。

使用 for 循环解析深度嵌套的 xml 文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-02-21 12:25:34

使用 for 循环解析深度嵌套的 xml 文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-02-21 12:25:34

解决方案1
0 已采纳 2021-02-21 12:25:34