import urllib
import lxml.html
down='http://blog.sina.com.cn/s/blog_71f3890901017hof.html'
file=urllib.urlopen(down).read()
root=lxml.html.document_fromstring(file)
body=root.xpath('//div[@class="articalContent "]')[0]
print body.text_content()
When i run the code, what i get is the text content ,how can i get the html source code of it,not the text content?
Use
html = lxml.html.tostring(node)
and please: read the basic documentation of the tools you are using first.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.