How could I get all inner html from node which I select using etree xpath:
>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = '<foo><bar><div>привет привет</div></bar></foo>'
>>> hparser = etree.HTMLParser()
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
How could I now print all foo_element's inner HTML as text? I need to get this:
<bar><div>привет привет</div></bar>
BTW when I tried to use lxml.html.tostring
I get strange output:
>>> import lxml.etree
>>> lxml.html.tostring(foo_element[0])
'<foo><bar><div>привет првиет</div></bar></foo>'
You can apply the same technique as shown in this other SO post . Example in the context of this question :
>>> from lxml import etree
>>> from lxml import html
>>> from StringIO import StringIO
>>> doc = '<foo><bar><div>TEST NODE</div></bar></foo>'
>>> hparser = etree.HTMLParser()
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
>>> print ''.join(html.tostring(e) for e in foo_element[0])
<bar><div>TEST NODE</div></bar>
Or to handle case when the element may contain text node child :
>>> doc = '<foo>text node child<bar><div>TEST NODE</div></bar></foo>'
>>> htree = etree.parse(StringIO(doc), hparser)
>>> foo_element = htree.xpath("//foo")
>>> print foo_element[0].text + ''.join(html.tostring(e) for e in foo_element[0])
text node child<bar><div>TEST NODE</div></bar>
Refactoring the code into a separate function as shown in the linked post is strongly advised for the real case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.