简体繁体中英

Parse paragraphs from HTML using lxml

原文 2011-10-16 15:58:49 3 1 python/ lxml

I am new to lxml and want to extract <p>PARAGRAPHS</p> and <li>PARAGRAPHS</li> from a given url and use them for further steps.

I followed an example from a post , and tried the following code with no luck:

html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')

I tried to look into the examples in lxml.html , but didn't find any example using url.

Could you give me any hint on what methods should I use? Thanks.

1 answers

import lxml.html

htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')

print htmltree.xpath('//p/text()')

How to parse text from html using lxml?

Using lxml to parse namepaced HTML?

Parse Html using lxml and xpath

Parse HTML using LXML in Python

parse nested html lists using lxml in python

Not able to parse html using lxml Xpath parser

python parse html table using lxml

Parse the date string from html in lxml

Lxml parse DIV inside Tag from HTML

parse html tables with lxml

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to parse text from html using lxml? Using lxml to parse namepaced HTML? Parse Html using lxml and xpath Parse HTML using LXML in Python parse nested html lists using lxml in python Not able to parse html using lxml Xpath parser python parse html table using lxml Parse the date string from html in lxml Lxml parse DIV inside Tag from HTML parse html tables with lxml

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM