[英]python xml parse (minidom)
I need to read data from this XML file. 我需要从这个XML文件中读取数据。 I don´t know, how I have to read data aaaaa, bbbbb, ccccc, ddddd, eeeee, fffff and ggggg from this XML file. 我不知道如何从该XML文件读取数据aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff和ggggg。
<Episode>
<Section type="report" startTime="0" endTime="10">
<Turn startTime="0" endTime="2.284" speaker="spk1">
<Sync time="0"/>
aaaaa
<Sync time="0.93"/>
bbbbb
</Turn>
<Turn speaker="spk2" startTime="2.284" endTime="6.458">
<Sync time="2.284"/>
ccccc
<Sync time="3.75"/>
ddddd
<Sync time="4.911"/>
eeeee
</Turn>
<Turn speaker="spk3" startTime="6.458" endTime="10">
<Sync time="6.458"/>
fffff
<Sync time="8.467"/>
ggggg
<Sync time="9.754"/>
</Turn>
</Section>
</Episode>
I write this code: 我写这段代码:
# -*- coding: UTF-8-*-
from xml.etree import ElementTree as ET
import os
from xml.dom import minidom
dom = minidom.parse("aaa.trs")
conference=dom.getElementsByTagName('Turn')
for node in conference:
conf_name=node.getAttribute('speaker')
print conf_name
sync=node.getElementsByTagName('Sync')
for s in sync:
s_name=s.getAttribute('time')
print s_name
Output is: 输出为:
sp1
sp2
sp3
But the output should be: 但是输出应该是:
sp1
aaaaa
bbbbb
sp2
ccccc
ddddd
eeeee
sp3
fffff
ggggg
Any suggestions? 有什么建议么? Thank you. 谢谢。
One way is to get the nextSibling
of every Sync
node: 一种方法是获取每个Sync
节点的nextSibling
:
conference = dom.getElementsByTagName('Turn')
for node in conference:
conf_name = node.getAttribute('speaker')
print conf_name
sync = node.getElementsByTagName('Sync')
for s in sync:
print s.nextSibling.nodeValue.strip()
prints: 打印:
spk1
aaaaa
bbbbb
spk2
ccccc
ddddd
eeeee
spk3
fffff
ggggg
Also, you can achieve the same result with ElementTree
by getting the tail
of each Sync
node: 另外,通过获取每个Sync
节点的tail
,您可以使用ElementTree
获得相同的结果:
tree = ET.parse("aaa.trs")
for turn in tree.findall('.//Turn'):
print turn.attrib.get('speaker')
for sync in turn.findall('.//Sync'):
print sync.tail.strip()
Hope that helps. 希望能有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.