简体   繁体   中英

Parse paragraphs from HTML using lxml

I am new to lxml and want to extract <p>PARAGRAPHS</p> and <li>PARAGRAPHS</li> from a given url and use them for further steps.

I followed an example from a post , and tried the following code with no luck:

html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')

I tried to look into the examples in lxml.html , but didn't find any example using url.

Could you give me any hint on what methods should I use? Thanks.

import lxml.html

htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')

print htmltree.xpath('//p/text()')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM