[英]How to get text for a root element using lxml?
I'm completely stumped why lxml .text
will give me the text for a child tag but for the root tag. 我完全不知道为什么lxml
.text
会给我一个子标签的文本但是为root标签。
some_tag = etree.fromstring('<some_tag class="abc"><strong>Hello</strong> World</some_tag>')
some_tag.find("strong")
Out[195]: <Element strong at 0x7427d00>
some_tag.find("strong").text
Out[196]: 'Hello'
some_tag
Out[197]: <Element some_tag at 0x7bee508>
some_tag.text
some_tag.find("strong").text
returns the text between the <strong>
tag. some_tag.find("strong").text
返回<strong>
标记之间的文本。
I expect some_tag.text
to return everything between <some_tag> ... </some_tag>
我希望
some_tag.text
能够返回<some_tag> ... </some_tag>
之间的所有内容
Expected: 预期:
<strong>Hello</strong> World
Instead, it returns nothing. 相反,它什么都不返回。
from lxml import etree
XML = '<some_tag class="abc"><strong>Hello</strong> World</some_tag>'
some_tag = etree.fromstring(XML)
for element in some_tag:
print element.tag, element.text, element.tail
Output: 输出:
strong Hello World
For information on the .text
and .tail
properties, see: 有关
.text
和.tail
属性的信息,请参阅:
To get exactly the result that you expected, use 要获得您期望的结果,请使用
print etree.tostring(some_tag.find("strong"))
Output: 输出:
<strong>Hello</strong> World
You'll find the missing text here 你会在这里找到丢失的文字
>>> some_tag.find("strong").tail
' World'
Look at http://lxml.de/tutorial.html and search for "tail". 查看http://lxml.de/tutorial.html并搜索“tail”。
I'm not sure to understand your question but you have 2 main solutions in parsing : 我不确定你理解你的问题,但解析时你有两个主要的解决方案:
DOMParser : depending the langage, it's node.getNodeValue(); DOMParser:取决于语言,它是node.getNodeValue();
SAXParser : depending the langage, but in java for example is in the fonction : characters(...) SAXParser:取决于语言,但在例如java中是在fonction:characters(...)
I haven't the time to search on google but in python, I know MiniDOM (a DOM parser) : http://www.blog.pythonlibrary.org/2010/11/12/python-parsing-xml-with-minidom/ 我没有时间在谷歌搜索,但在python中,我知道MiniDOM(一个DOM解析器): http : //www.blog.pythonlibrary.org/2010/11/12/python-parsing-xml-with-minidom /
I hope my answer can help you. 我希望我的回答可以帮到你。
Does this help? 这有帮助吗?
comp = [ etree.tostring(e) for e in some_tag]
print ''.join(comp[0])
EDITED: Thanks @mzjin for putting me on the right track 编辑:谢谢@mzjin让我走上正轨
You have to use inbuilt lxml method to retrieve all the text between the tag. 您必须使用内置的lxml方法来检索标记之间的所有文本。
from lxml import etree
xml='''<some_tag class="abc"><strong>Hello</strong> World</some_tag>'''
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//text()')))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.