[英]Extracting text after tag in Python's ElementTree
Here is a part of XML: 这是XML的一部分:
<item><img src="cat.jpg" /> Picture of a cat</item>
Extracting the tag is easy. 提取标签很容易。 Just do:
做就是了:
et = xml.etree.ElementTree.fromstring(our_xml_string)
img = et.find('img')
But how to get the text immediately after it ( Picture of a cat )? 但是,如何立即获得文本( 猫的照片 )呢? Doing the following returns a blank string:
执行以下操作将返回一个空白字符串:
print et.text
Elements have a tail
attribute -- so instead of element.text
, you're asking for element.tail
. 元素具有
tail
属性-因此,您需要的是element.text
而不是element.tail
。
>>> import lxml.etree
>>> root = lxml.etree.fromstring('''<root><foo>bar</foo>baz</root>''')
>>> root[0]
<Element foo at 0x145a3c0>
>>> root[0].tail
'baz'
Or, for your example: 或者,例如:
>>> et = lxml.etree.fromstring('''<item><img src="cat.jpg" /> Picture of a cat</item>''')
>>> et.find('img').tail
' Picture of a cat'
This also works with plain ElementTree: 这也适用于普通的ElementTree:
>>> import xml.etree.ElementTree
>>> xml.etree.ElementTree.fromstring(
... '''<item><img src="cat.jpg" /> Picture of a cat</item>'''
... ).find('img').tail
' Picture of a cat'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.