简体   繁体   English

我如何从使用python的lxml etree和xpath选择的节点获取(打印)所有内部html?

[英]How could I get (print) all inner html from node which I select using python's lxml etree and xpath?

How could I get all inner html from node which I select using etree xpath: 我如何从使用etree xpath选择的节点获取所有内部html:

>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = '<foo><bar><div>привет привет</div></bar></foo>'
>>> hparser = etree.HTMLParser()
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")

How could I now print all foo_element's inner HTML as text? 现在如何将所有foo_element的内部HTML打印为文本? I need to get this: 我需要得到这个:

<bar><div>привет привет</div></bar>

BTW when I tried to use lxml.html.tostring I get strange output: 顺便说一句,当我尝试使用lxml.html.tostring ,得到奇怪的输出:

>>> import lxml.etree
>>> lxml.html.tostring(foo_element[0])
'<foo><bar><div>&#208;&#191;&#209;&#128;&#208;&#184;&#208;&#178;&#208;&#181;&#209;&#130; &#208;&#191;&#209;&#128;&#208;&#178;&#208;&#184;&#208;&#181;&#209;&#130;</div></bar></foo>'

You can apply the same technique as shown in this other SO post . 您可以应用与此其他SO帖子中所示的相同技术。 Example in the context of this question : 有关此问题的示例:

>>> from lxml import etree
>>> from lxml import html
>>> from StringIO import StringIO
>>> doc = '<foo><bar><div>TEST NODE</div></bar></foo>'
>>> hparser = etree.HTMLParser()
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
>>> print ''.join(html.tostring(e) for e in foo_element[0])
<bar><div>TEST NODE</div></bar>

Or to handle case when the element may contain text node child : 或处理元素可能包含文本节点child的情况:

>>> doc = '<foo>text node child<bar><div>TEST NODE</div></bar></foo>'
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
>>> print foo_element[0].text + ''.join(html.tostring(e) for e in foo_element[0])
text node child<bar><div>TEST NODE</div></bar>

Refactoring the code into a separate function as shown in the linked post is strongly advised for the real case. 对于实际情况,强烈建议将代码重构为单独的功能,如链接文章中所示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 的 lxml.etree 库从 xml 标签的所有嵌套标签中获取所有字符串? - How to get all strings from all nested tags of a xml tag with python's lxml.etree library? 将LXML与Html,Requests和ETree一起使用,除了文本之外,它还提供了所有HTML标签,我不知道为什么 - Using LXML with Html, Requests, and ETree, it gives all HTML Tags in addition to text, and I dont know why 使用lxml python etree从html页面中删除特定元素 - Remove specific element from html page using lxml python etree 使用 LXML 从 html 文件中获取 xpath - Python - Get xpath from html file using LXML - Python 如何通过使用lxml.etree python中的类名来解析html - how to parse html by using class name in lxml.etree python 在python etree中使用XPATH选择没有特定属性的节点 - Using XPATH in python etree to select node with out a specific attribute 如何将未转义的文本添加到 Python 中的 LXML Etree? - How can I add unescaped text to an LXML Etree in Python? 如何使用 lxml 从 xpath 获取所有文本 - How to get all text from an xpath using lxml 如何使用Python(lxml,html,requests,xpath)从一页获取不同的表? - How to get different tables from one page using Python (lxml, html, requests, xpath)? 如何使用lxml删除python中与xpath匹配的所有元素? - How can I remove all elements matching an xpath in python using lxml?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM