使用ElementTree解析XML文件

Question

我在解析XML文件时遇到两个问题。 我只想返回一组属性，即只返回第一个进程下的属性值，而我想返回第二个进程下的第二个Source。 当我使用我的代码时，它在第一个Sources下返回Source，在第二个Sources下返回第一个Source，但是我无法返回第二个Source。

XML文件如下所示：

<!-- The description of the process -->
<Description>"This is a description"</Description>

<!-- info on process to be run -->
<Process expectFailure="false">
    <Code>Import</Code>
    <Sources>
        <Source>"Test Data"</Source>
    </Sources>
    <Destination>Buffered</Destination>
    <Properties>
        <Property code="format" value="CC"/>
        <Property code="Input" value="10N"/>
        <Property code="Method" value="BASIC"/>
        <Ppoperty code="Resolution" value="5"/>
        <Property code="Convention" value="LEFT"/>
        <Property code="Bounding" value="BUFFERED"/>
    </Properties>
</Process>

<!-- info on second process to be run (compare) -->
<Process>
    <Code>SurfaceCompare</Code>
    <Sources>
        <Source>expectedOutput</Source>
        <Source>Buffered</Source>
    </Sources>
    <Properties>
        <Property code="compare_designated" value="true"/>
        <Property code="compare_metadata" value="true"/>
        <Property code="metadata_type" value="OTHER"/>
    </Properties>
</Process>

和代码看起来像

from xml.etree import ElementTree

tree = ElementTree.parse("XML_example.xml")

description = tree.findtext("Description")
print(description)

for process in tree.findall('Process'):
    for source in process.findall('Sources'):
        source_text = source.findtext('Source')
        print(source_text)

#returns everything
for property in process.iter('Property'):
    print(property.attrib.get('code'))
    print(property.attrib.get('value'))

for process in tree.findall('Process'):
    for source in process.findall('Sources'):
        source = source.findtext('Source')
        print(source)

我尝试了很多使用findall，find，iter，get，getiter方法的方法。 我确定我想念一些东西，但是那已经是漫长的一天，对于我的一生，我看不到我想念的东西。

也可以更改XML的设置方式，但是我知道必须有一种解决此问题的方法，并且它正在困扰我。

样本适当的输出来源：

"Test Data"
expectedOutput
buffered

对属性的适当输出1进行采样：

format
CC
Input
10N
Method
BASIC
Convention
LEFT
Bounding
BUFFERED

示例正确的输出2：

compare_designated 
true
compare_metadata 
true
metadata_type 
OTHER

Answer 1

达到你想要什么，最简单的方法是使用find或findall与路径 ，ITER用标签的名字，但在你的情况下，使用路径将是更适合运作良好。

这是一种实现方式，顺便说一句，您的示例缺少根元素，因此我在代码中添加了该元素：

import xml.etree.ElementTree as ET
from StringIO import StringIO

s = '''<!-- The description of the process -->
<Description>"This is a description"</Description>

<!-- info on process to be run -->
<Process expectFailure="false">
    <Code>Import</Code>
    <Sources>
        <Source>"Test Data"</Source>
    </Sources>
    <Destination>Buffered</Destination>
    <Properties>
        <Property code="format" value="CC"/>
        <Property code="Input" value="10N"/>
        <Property code="Method" value="BASIC"/>
        <Ppoperty code="Resolution" value="5"/>
        <Property code="Convention" value="LEFT"/>
        <Property code="Bounding" value="BUFFERED"/>
    </Properties>
</Process>

<!-- info on second process to be run (compare) -->
<Process>
    <Code>SurfaceCompare</Code>
    <Sources>
        <Source>expectedOutput</Source>
        <Source>Buffered</Source>
    </Sources>
    <Properties>
        <Property code="compare_designated" value="true"/>
        <Property code="compare_metadata" value="true"/>
        <Property code="metadata_type" value="OTHER"/>
    </Properties>
</Process>'''

# once you've parsed the file, you need to **getroot()**
tree = ET.parse(StringIO('<root>' + s + '</root>')).getroot()

例如，您可以使用path从第一个Process [1]-> Properties-> Property中获取 ，使用findall可以访问所有Property节点，并对其进行迭代：

# and iterate all Property nodes, and get their attributes like this
for p in tree.findall('./Process[1]/Properties/Property'):
    print p.attrib # to get code/value, use p.attrib.get('code') etc.

因此，您将获得第一个Process / Properties和所有Property的属性：

{'code': 'format', 'value': 'CC'}
{'code': 'Input', 'value': '10N'}
{'code': 'Method', 'value': 'BASIC'}
{'code': 'Convention', 'value': 'LEFT'}
{'code': 'Bounding', 'value': 'BUFFERED'}

另一个示例，仅使用find使用path获得第二个Process ，第二个Source文本，也很简单：

print tree.find('./Process[2]/Sources/Source[2]').text
Buffered

希望您有使用它们的想法，记得记住使用find的单个节点，返回节点列表，使用findall ，希望这会有所帮助。

使用ElementTree解析XML文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-02-04 22:18:37

使用ElementTree解析XML文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-02-04 22:18:37

解决方案1
1 已采纳 2015-02-04 22:18:37