简体   繁体   中英

AttributeError: 'list' object has no attribute 'extract'?

I just want extract info from this url( http://www.tuniu.com/g3300/whole-nj-0/list-l1602-h0-i-j0_0/ ) via xpath. As I run the following code ,it occur AttributeError: 'list' object has no attribute 'extract'? Is my module import wrong or dont match?

# -*- coding: utf-8 -*-

import urllib2
import sys
import lxml.html as HTML
reload(sys)
sys.setdefaultencoding("utf-8")


class spider(object):
    def __init__(self):
        print u'开始爬取内容'

def getSource(self, url):
    html = urllib2.Request(url)
    pageContent = urllib2.urlopen(html,timeout=60).read()
    return pageContent

def getUrl(self, pageContent):
    htmlSource = HTML.fromstring(pageContent)
    urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href').extract()[0]
    return urlInfo


if __name__ == "__main__":
    url = "http://www.tuniu.com/g3300/whole-nj-0/list-l1602-h0-i-j0_0/"
    tuniu = spider()
    tuniu.getUrl(url)

following is error!

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "D:\anzhuang\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
 File "D:\anzhuang\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
 File "D:/python/tuniu2/tuniu.py", line 34, in <module>
tuniu.getUrl(url)
 File "D:/python/tuniu2/tuniu.py", line 27, in getUrl
urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href').extract()[0]
 AttributeError: 'list' object has no attribute 'extract'

First, getUrl is called with a url. It does not fetch the content of the url. Modify it to get page content.

And extract is not needed. To get the href , just get an item from the returned list.

def getUrl(self, url):
    pageContent = self.getSource(url)  # <---
    htmlSource = HTML.fromstring(pageContent)
    urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href')[0]
    return urlInfo

xpath will return a list of the tags contained in the URL, so you're trying to extract on the list instead of any tags contained within. If you only want the first tag extracted then you probably want to put the [0] before the extract call, like this:

urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href')[0].extract()

It's unclear which info you want, but if it's not contained in this first tag then you might want to iterate over urlInfo with for tag in urlInfo . And then tag.extract() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM