![](/img/trans.png)
[英]How to get all strings from all nested tags of a xml tag with python's lxml.etree library?
[英]How could I get (print) all inner html from node which I select using python's lxml etree and xpath?
我如何從使用etree xpath選擇的節點獲取所有內部html:
>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = '<foo><bar><div>привет привет</div></bar></foo>'
>>> hparser = etree.HTMLParser()
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
現在如何將所有foo_element的內部HTML打印為文本? 我需要得到這個:
<bar><div>привет привет</div></bar>
順便說一句,當我嘗試使用lxml.html.tostring
,得到奇怪的輸出:
>>> import lxml.etree
>>> lxml.html.tostring(foo_element[0])
'<foo><bar><div>привет првиет</div></bar></foo>'
您可以應用與此其他SO帖子中所示的相同技術。 有關此問題的示例:
>>> from lxml import etree
>>> from lxml import html
>>> from StringIO import StringIO
>>> doc = '<foo><bar><div>TEST NODE</div></bar></foo>'
>>> hparser = etree.HTMLParser()
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
>>> print ''.join(html.tostring(e) for e in foo_element[0])
<bar><div>TEST NODE</div></bar>
或處理元素可能包含文本節點child的情況:
>>> doc = '<foo>text node child<bar><div>TEST NODE</div></bar></foo>'
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
>>> print foo_element[0].text + ''.join(html.tostring(e) for e in foo_element[0])
text node child<bar><div>TEST NODE</div></bar>
對於實際情況,強烈建議將代碼重構為單獨的功能,如鏈接文章中所示。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.