python xml parse（最小）

Question

I need to read data from this XML file. 我需要从这个XML文件中读取数据。 I don´t know, how I have to read data aaaaa, bbbbb, ccccc, ddddd, eeeee, fffff and ggggg from this XML file. 我不知道如何从该XML文件读取数据aaaaa，bbbbb，ccccc，ddddd，eeeee，fffff和ggggg。

<Episode>
<Section type="report" startTime="0" endTime="10">
    <Turn startTime="0" endTime="2.284" speaker="spk1">
        <Sync time="0"/>
        aaaaa
        <Sync time="0.93"/>
        bbbbb
    </Turn>
    <Turn speaker="spk2" startTime="2.284" endTime="6.458">
        <Sync time="2.284"/>
        ccccc
        <Sync time="3.75"/>
        ddddd
        <Sync time="4.911"/>
        eeeee
    </Turn>
    <Turn speaker="spk3" startTime="6.458" endTime="10">
        <Sync time="6.458"/>
        fffff
        <Sync time="8.467"/>
        ggggg
    <Sync time="9.754"/>

    </Turn>
</Section>
</Episode>

I write this code: 我写这段代码：

# -*- coding: UTF-8-*-

from xml.etree import ElementTree as ET
import os
from xml.dom import minidom

dom = minidom.parse("aaa.trs")

conference=dom.getElementsByTagName('Turn')
for node in conference:
    conf_name=node.getAttribute('speaker')
    print conf_name
    sync=node.getElementsByTagName('Sync')
    for s in sync:
        s_name=s.getAttribute('time')
        print s_name

Output is: 输出为：

sp1
sp2
sp3

But the output should be: 但是输出应该是：

sp1
aaaaa
bbbbb
sp2
ccccc
ddddd
eeeee
sp3
fffff
ggggg

Any suggestions? 有什么建议么？ Thank you. 谢谢。

Answer 1

One way is to get the nextSibling of every Sync node: 一种方法是获取每个Sync节点的nextSibling ：

conference = dom.getElementsByTagName('Turn')
for node in conference:
    conf_name = node.getAttribute('speaker')
    print conf_name
    sync = node.getElementsByTagName('Sync')
    for s in sync:
        print s.nextSibling.nodeValue.strip()

prints: 打印：

spk1
aaaaa
bbbbb
spk2
ccccc
ddddd
eeeee
spk3
fffff
ggggg

Also, you can achieve the same result with ElementTree by getting the tail of each Sync node: 另外，通过获取每个Sync节点的tail ，您可以使用ElementTree获得相同的结果：

tree = ET.parse("aaa.trs")
for turn in tree.findall('.//Turn'):
    print turn.attrib.get('speaker')
    for sync in turn.findall('.//Sync'):
        print sync.tail.strip()

Hope that helps. 希望能有所帮助。

python xml parse（最小）

问题描述

1 个解决方案

解决方案1
2 2014-01-27 12:35:55

python xml parse（最小）

问题描述

1 个解决方案

解决方案1 2 2014-01-27 12:35:55

解决方案1
2 2014-01-27 12:35:55