[英]python xml parse with minidom
I have this XML file and I need to read the value from Sync and Event in the same order as in the XML file. 我有这个XML文件,我需要按照与XML文件中相同的顺序从Sync和Event中读取值。
<Episode>
<Section type="report" startTime="0" endTime="263.035">
<Turn startTime="0" endTime="4.844" speaker="spk1">
<Sync time="0"/>
aaaaa
</Turn>
<Turn speaker="spk2" startTime="4.844" endTime="15.531">
<Sync time="4.844"/>
bbbbb
<Event desc="poz" type="noise" extent="begin"/>
ccccc
<Event desc="poz" type="noise" extent="end"/>
ddddd
<Sync time="12.210"/>
eeeee
</Turn>
<Turn speaker="spk1" startTime="15.531" endTime="17.549">
<Event desc="poz" type="noise" extent="begin"/>
fffff
</Turn>
</Section>
</Episode>
And I need this output: 我需要以下输出:
aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff
Is there any solution? 有什么解决办法吗? Thank you. 谢谢。
Use the builtin sax parser: 使用内置的sax解析器:
from xml import sax
class EpisodeContentHandler(sax.ContentHandler):
def characters(self, content):
content = content.strip()
if content:
print content
with open("Episode.xml") as f:
sax.parse(f, EpisodeContentHandler())
Unless you're somehow limited to using Minidom, try using 'ElementTree' as Martijn suggested. 除非您因某种原因仅限于使用Minidom,否则请尝试使用Martijn建议的'ElementTree'。 From my personal experience, it's much easier to use. 从我的个人经验来看,它使用起来要容易得多。 You can find it's documentation here 你可以在这里找到它的文档
For your problem, you can try something like this: 对于您的问题,您可以尝试执行以下操作:
import xml.etree.ElementTree as ET
# Get the tree structure of the XML
tree = ET.parse("data.xml")
# Get the root/first tag in the tree
root = tree.getroot()
# Ge all elements with interesting tags
for child in root.findall("Sync"):
print child.text
Sidenote: child.attrib
is a map to all the tag's attributes. 旁注: child.attrib
是所有标记属性的映射。
If you insist on using minidom: 如果您坚持使用minidom:
elements = minidom.parseString(xml).getElementsByTagName('*') # where xml is your input xml
for el in elements:
if el.localName == 'Sync' or el.localName == 'Event':
print el.nextSibling.nodeValue.strip()
This will print: 这将打印:
aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.