简体   繁体   中英

Extracting a section from web page using python

I want to extract the section of test for the section symptoms from the website below using python and lxml. Can anyone please help.

http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001851/

Thank you,

You want to Scrape a webpage with lxml? try this:

 from lxml.html import parse
 doc = parse("http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0001851/").getroot()
 for h2 in doc.cssselect('h2'):
     print h2.text_content()

this will open up grab the h2s from your page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM