Problems using etree in Python scraper

Question

I'm a newbie in Python looking to build a screen scraper in Scraperwiki but I'm struggling with an error I can't work out how to fix. Essentially, I want to parse an xml file but can't work out how to have my gp_indicators_scrape function access the getroot() method.

Can anyone fix it, and more importantly, point me towards an explanation so I can avoid the problem in future?

Here's the scraper: https://scraperwiki.com/scrapers/choiceshu1

The key bits of code:

import lxml.html
import urlparse
from urlparse import urlparse
from lxml.etree import etree

def gp_indicators_scrape(org_URL):

     indicator_xml = etree.parse(org_URL)
     root = lxml.etree.getroot(indicator_XML)
     print root 

html = scraperwiki.scrape(combined_URL_for_first_scrape)
print html
root = lxml.html.fromstring(html)
links = root.cssselect("dd a")

And here's the error when it runs

Line 5 - from lxml.etree import etree
ImportError: cannot import name etree

Answer 1

from lxml.etree import etree should be from lxml import etree

Also, just noticed - lxml.etree.getroot(...) - you can drop the lxml. if you use the import above, and normally you call getroot() on the object returned via etree.parse (or similar).

NB: I haven't looked at code in the provided link...

Problems using etree in Python scraper

Question

1 answers

solution1
1 ACCPTED 2012-07-24 08:27:38

Problems using etree in Python scraper

Question

1 answers

solution1 1 ACCPTED 2012-07-24 08:27:38

solution1
1 ACCPTED 2012-07-24 08:27:38