解析xml文件中的单词

Question

I have a xml file called "interview.xml" that looks like this: 我有一个名为“ interview.xml”的xml文件，如下所示：

<SpeechSegment spkid="S0">
    <Word dur="0.22" stime="0.44">oh</Word>
    <Word dur="0.27" stime="1.67">bedankt</Word>
    <Word dur="0.3" stime="2.03">voor</Word>
    <Word dur="0.53" stime="2.61">deelname</Word>
    <Word dur="0.22" stime="3.15">aan</Word>
    <Word dur="0.23" stime="3.39">de</Word>
    <Word dur="0.14" stime="6.15">want</Word>
    <Word dur="0.07" stime="6.29">ik</Word>
    <Word dur="0.09" stime="6.36">wil</Word>
    <Word dur="0.06" stime="6.45">je</Word>
    <Word dur="0.42" stime="6.51">graag</Word>
    <Word dur="0.2" stime="7.52">en</Word>
</SpeechSegment>

What I would like to do now is parse all the words from this segment, so I want to create a list like ["oh", "bedankt", "voor", etc...] 我现在想做的是解析此段中的所有单词，因此我想创建一个列表，例如["oh", "bedankt", "voor", etc...]

I tried this: 我尝试了这个：

import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('Interview_short.xml').getroot()

for atype in e.findall('type'):
    print(atype.get('word'))

But this does not give me the output I am looking for. 但这并没有给我我想要的输出。 Any thoughts what adjustments I should make? 有什么想法我应该进行调整吗？

Answer 1

Use ElementTree . 使用ElementTree 。

Solution: 解：

import xml.etree.ElementTree as ET
root = ET.fromstring(xml_string)
required_list = [child.text for child in root]

Answer 2

I have no idea why findall('type') while the XML doesn't contain any <type> element. 我不知道为什么在XML不包含任何<type>元素的情况下为什么使用findall('type') 。 According to the XML posted, it should've been findall('Word') . 根据发布的XML，它应该是findall('Word') 。 Here is a minimal but complete codes for demo : 这是最小但完整的demo代码：

raw = '''<SpeechSegment spkid="S0">
    <Word dur="0.22" stime="0.44">oh</Word>
    <Word dur="0.27" stime="1.67">bedankt</Word>
    <Word dur="0.3" stime="2.03">voor</Word>
    <Word dur="0.53" stime="2.61">deelname</Word>
    <Word dur="0.22" stime="3.15">aan</Word>
    <Word dur="0.23" stime="3.39">de</Word>
    <Word dur="0.14" stime="6.15">want</Word>
    <Word dur="0.07" stime="6.29">ik</Word>
    <Word dur="0.09" stime="6.36">wil</Word>
    <Word dur="0.06" stime="6.45">je</Word>
    <Word dur="0.42" stime="6.51">graag</Word>
    <Word dur="0.2" stime="7.52">en</Word>
</SpeechSegment>'''

from xml.etree import ElementTree as ET
root = ET.fromstring(raw)
result = [word.text for word in root.findall('Word')]
print result

eval.in demo

output : 输出：

['oh', 'bedankt', 'voor', 'deelname', 'aan', 'de', 'want', 'ik', 'wil', 'je', 'graag', 'en']

解析xml文件中的单词

问题描述

2 个解决方案

解决方案1
1 2017-05-26 09:18:33

解决方案2
1 2017-05-26 09:39:02

解析xml文件中的单词

问题描述

2 个解决方案

解决方案1 1 2017-05-26 09:18:33

解决方案2 1 2017-05-26 09:39:02

解决方案1
1 2017-05-26 09:18:33

解决方案2
1 2017-05-26 09:39:02