简体   繁体   中英

Problems using etree in Python scraper

I'm a newbie in Python looking to build a screen scraper in Scraperwiki but I'm struggling with an error I can't work out how to fix. Essentially, I want to parse an xml file but can't work out how to have my gp_indicators_scrape function access the getroot() method.

Can anyone fix it, and more importantly, point me towards an explanation so I can avoid the problem in future?

Here's the scraper: https://scraperwiki.com/scrapers/choiceshu1

The key bits of code:

import lxml.html
import urlparse
from urlparse import urlparse
from lxml.etree import etree

def gp_indicators_scrape(org_URL):

     indicator_xml = etree.parse(org_URL)
     root = lxml.etree.getroot(indicator_XML)
     print root 

html = scraperwiki.scrape(combined_URL_for_first_scrape)
print html
root = lxml.html.fromstring(html)
links = root.cssselect("dd a")

And here's the error when it runs

Line 5 - from lxml.etree import etree
ImportError: cannot import name etree

from lxml.etree import etree should be from lxml import etree

Also, just noticed - lxml.etree.getroot(...) - you can drop the lxml. if you use the import above, and normally you call getroot() on the object returned via etree.parse (or similar).

NB: I haven't looked at code in the provided link...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM