[英]python xml.etree.ElementTree get everything inside element whether its text or children
I am using xml.etree.ElementTree
, and if possible would like not to change XML parsing library. 我正在使用
xml.etree.ElementTree
,并且如果可能的话不希望更改XML解析库。
I can parse XML file without any problem. 我可以解析XML文件,没有任何问题。 I have a speclial
<description>
tag which contains text and want to retrieve this text. 我有一个特殊的
<description>
标记,其中包含文本,并且想要检索此文本。 Here is the code I am using for that purpose: 这是我用于此目的的代码:
import xml.etree.ElementTree as ET
rss = ET.fromstring(rss_content)
for node in rss[0].getchildren():
if node.tag == 'description':
print node.text
so far, so good. 到现在为止还挺好。 But I sometimes have as text another xml content and can't retrieve this as a text.
但是我有时会有另一个xml内容作为文本,因此无法将其作为文本检索。 I could retrieve this with methods as
getchildren
and make a switch case whether this is recognized as text or as XML; 我可以使用
getchildren
方法来检索它,并进行切换以区分为文本还是XML。 but I was wondering if I could directly retrieve the whole content, XML or not, as text, in a simpler way? 但是我想知道是否可以以一种更简单的方式直接检索全部内容(是否为XML)作为文本?
There is the itertext()
method on an ElementTree Element - it returns all the nested text, for example: ElementTree元素上有
itertext()
方法-它返回所有嵌套的文本,例如:
xmltxt='''<?xml version="1.0"?>
<TEXT>
<Description>
<V>played</V>
<N>John</N>
<PREP>with</PREP>
<en x='PERS'>Adam</en>
<PREP>in</PREP>
<en x='LOC'> ASL school</en>
</Description>
<Description>
<V y='0'>went</V>
<en x='PERS'>Mark</en>
<PREP>to</PREP>
<en x='ORG'>United Nations</en>
<PREP>for</PREP>
<PREP>a</PREP>
<N>visit</N>
</Description>
</TEXT>
'''
root = ET.fromstring(xmltxt)
for ch in root:
print ch
print "".join(ch.itertext())
print ET.tostring(ch)
Output is: 输出为:
played
John
with
Adam
in
ASL school
<Description>
<V>played</V>
<N>John</N>
<PREP>with</PREP>
<en x="PERS">Adam</en>
<PREP>in</PREP>
<en x="LOC"> ASL school</en>
</Description>
went
Mark
to
United Nations
for
a
visit
<Description>
<V y="0">went</V>
<en x="PERS">Mark</en>
<PREP>to</PREP>
<en x="ORG">United Nations</en>
<PREP>for</PREP>
<PREP>a</PREP>
<N>visit</N>
</Description>
Or to recurse through nested elements, use iter()
method, collecting .text for text within the tag, and .tail for text after a tag. 或者要遍历嵌套元素,请使用
iter()
方法,为标记内的文本收集.text,为标记后的文本收集.tail。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.