python xml.etree.ElementTree获取元素内的所有内容，无论其文本还是子元素

Question

I am using xml.etree.ElementTree , and if possible would like not to change XML parsing library. 我正在使用xml.etree.ElementTree ，并且如果可能的话不希望更改XML解析库。

I can parse XML file without any problem. 我可以解析XML文件，没有任何问题。 I have a speclial <description> tag which contains text and want to retrieve this text. 我有一个特殊的<description>标记，其中包含文本，并且想要检索此文本。 Here is the code I am using for that purpose: 这是我用于此目的的代码：

import xml.etree.ElementTree as ET
rss = ET.fromstring(rss_content)
for node in rss[0].getchildren():
    if node.tag == 'description':
        print node.text

so far, so good. 到现在为止还挺好。 But I sometimes have as text another xml content and can't retrieve this as a text. 但是我有时会有另一个xml内容作为文本，因此无法将其作为文本检索。 I could retrieve this with methods as getchildren and make a switch case whether this is recognized as text or as XML; 我可以使用getchildren方法来检索它，并进行切换以区分为文本还是XML。 but I was wondering if I could directly retrieve the whole content, XML or not, as text, in a simpler way? 但是我想知道是否可以以一种更简单的方式直接检索全部内容（是否为XML）作为文本？

Answer 1

There is the itertext() method on an ElementTree Element - it returns all the nested text, for example: ElementTree元素上有itertext()方法-它返回所有嵌套的文本，例如：

xmltxt='''<?xml version="1.0"?>
<TEXT>
    <Description>
        <V>played</V>
        <N>John</N>
        <PREP>with</PREP>
        <en x='PERS'>Adam</en>
        <PREP>in</PREP>
        <en x='LOC'> ASL school</en>
    </Description>
    <Description>
        <V y='0'>went</V>
        <en x='PERS'>Mark</en>
        <PREP>to</PREP>
        <en x='ORG'>United Nations</en>
        <PREP>for</PREP>
        <PREP>a</PREP>
        <N>visit</N>
    </Description>

</TEXT>
'''

root = ET.fromstring(xmltxt)

for ch in root:
    print ch
    print "".join(ch.itertext())
    print ET.tostring(ch)

Output is: 输出为：

        played
        John
        with
        Adam
        in
         ASL school

<Description>
        <V>played</V>
        <N>John</N>
        <PREP>with</PREP>
        <en x="PERS">Adam</en>
        <PREP>in</PREP>
        <en x="LOC"> ASL school</en>
    </Description>

        went
        Mark
        to
        United Nations
        for
        a
        visit

<Description>
        <V y="0">went</V>
        <en x="PERS">Mark</en>
        <PREP>to</PREP>
        <en x="ORG">United Nations</en>
        <PREP>for</PREP>
        <PREP>a</PREP>
        <N>visit</N>
    </Description>

Or to recurse through nested elements, use iter() method, collecting .text for text within the tag, and .tail for text after a tag. 或者要遍历嵌套元素，请使用iter()方法，为标记内的文本收集.text，为标记后的文本收集.tail。

python xml.etree.ElementTree获取元素内的所有内容，无论其文本还是子元素

问题描述

1 个解决方案

解决方案1
1 2016-01-10 12:26:49

python xml.etree.ElementTree获取元素内的所有内容，无论其文本还是子元素

问题描述

1 个解决方案

解决方案1 1 2016-01-10 12:26:49

解决方案1
1 2016-01-10 12:26:49