[英]Parse paragraphs from HTML using lxml
I am new to lxml and want to extract <p>PARAGRAPHS</p>
and <li>PARAGRAPHS</li>
from a given url and use them for further steps. 我是lxml的新手,想从给定的URL中提取
<p>PARAGRAPHS</p>
和<li>PARAGRAPHS</li>
,并将其用于进一步的步骤。
I followed an example from a post , and tried the following code with no luck: 我遵循了一个帖子中的示例,并尝试了以下代码,但没有成功:
html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')
I tried to look into the examples in lxml.html , but didn't find any example using url. 我试图查看lxml.html中的示例,但没有找到使用url的任何示例。
Could you give me any hint on what methods should I use? 您能给我些什么建议吗? Thanks.
谢谢。
import lxml.html
htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')
print htmltree.xpath('//p/text()')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.