简体   繁体   English

带有最小化的python xml解析

[英]python xml parse with minidom

I have this XML file and I need to read the value from Sync and Event in the same order as in the XML file. 我有这个XML文件,我需要按照与XML文件中相同的顺序从Sync和Event中读取值。

<Episode>
<Section type="report" startTime="0" endTime="263.035">
    <Turn startTime="0" endTime="4.844" speaker="spk1">
        <Sync time="0"/>
        aaaaa
    </Turn>
    <Turn speaker="spk2" startTime="4.844" endTime="15.531">
        <Sync time="4.844"/>
        bbbbb
        <Event desc="poz" type="noise" extent="begin"/>
        ccccc
        <Event desc="poz" type="noise" extent="end"/>
        ddddd

    <Sync time="12.210"/>
        eeeee 
    </Turn>
    <Turn speaker="spk1" startTime="15.531" endTime="17.549">
        <Event desc="poz" type="noise" extent="begin"/>
        fffff
    </Turn>
</Section>
</Episode>

And I need this output: 我需要以下输出:

aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff

Is there any solution? 有什么解决办法吗? Thank you. 谢谢。

Use the builtin sax parser: 使用内置的sax解析器:

from xml import sax

class EpisodeContentHandler(sax.ContentHandler):
    def characters(self, content):
        content = content.strip()
        if content:
            print content

with open("Episode.xml") as f:
    sax.parse(f, EpisodeContentHandler())

Unless you're somehow limited to using Minidom, try using 'ElementTree' as Martijn suggested. 除非您因某种原因仅限于使用Minidom,否则请尝试使用Martijn建议的'ElementTree'。 From my personal experience, it's much easier to use. 从我的个人经验来看,它使用起来要容易得多。 You can find it's documentation here 你可以在这里找到它的文档

For your problem, you can try something like this: 对于您的问题,您可以尝试执行以下操作:

import xml.etree.ElementTree as ET

# Get the tree structure of the XML
tree = ET.parse("data.xml")
# Get the root/first tag in the tree
root = tree.getroot()
# Ge all elements with interesting tags
for child in root.findall("Sync"):
   print child.text

Sidenote: child.attrib is a map to all the tag's attributes. 旁注: child.attrib是所有标记属性的映射。

If you insist on using minidom: 如果您坚持使用minidom:

elements = minidom.parseString(xml).getElementsByTagName('*') # where xml is your input xml
for el in elements:
    if el.localName == 'Sync' or el.localName == 'Event':
        print el.nextSibling.nodeValue.strip()

This will print: 这将打印:

aaaaa
bbbbb
ccccc
ddddd
eeeee
fffff

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM