使用lxml从HTML解析段落

Question

I am new to lxml and want to extract <p>PARAGRAPHS</p> and <li>PARAGRAPHS</li> from a given url and use them for further steps. 我是lxml的新手，想从给定的URL中提取<p>PARAGRAPHS</p>和<li>PARAGRAPHS</li> ，并将其用于进一步的步骤。

I followed an example from a post , and tried the following code with no luck: 我遵循了一个帖子中的示例，并尝试了以下代码，但没有成功：

html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')

I tried to look into the examples in lxml.html , but didn't find any example using url. 我试图查看lxml.html中的示例，但没有找到使用url的任何示例。

Could you give me any hint on what methods should I use? 您能给我些什么建议吗？ Thanks. 谢谢。

Answer 1

import lxml.html

htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')

print htmltree.xpath('//p/text()')

使用lxml从HTML解析段落

问题描述

1 个解决方案

解决方案1
7 已采纳 2011-10-16 16:09:30

使用lxml从HTML解析段落

问题描述

1 个解决方案

解决方案1 7 已采纳 2011-10-16 16:09:30

解决方案1
7 已采纳 2011-10-16 16:09:30