简体   繁体   English

带有scrapy版本0.22.1的多页抓取-“无法导入名称CrawlSpider”错误是什么意思?

[英]multi-page scraping with scrapy version 0.22.1 - what does the “cannot import name CrawlSpider” error mean?

I am trying to write a spider to crawl across multiple pages, via the following URL: http://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK I'm using Scrapy version 0.22.1 to do this. 我试图通过以下URL编写蜘蛛以跨多个页面进行爬网: http ://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK我正在使用Scrapy版本0.22.1去做这个。 However, I am getting a "cannot import name CrawlSpider" message. 但是,我收到“无法导入名称CrawlSpider”消息。 I have pasted the code for the spider below. 我已经在下面粘贴了蜘蛛的代码。 Can someone determine where I have gone wrong here? 有人可以确定我在哪里出问题了吗?

from scrapy.spider import CrawlSpider, Rule
from scrapy.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import BookpagesItem 

class BookpagesSpider(CrawlSpider):
name = "book_sample"
allowed_domains = ["bookshop.lawsociety.org.uk"]
start_urls = ["http://bookshop.lawsociety.org.uk/ecom_lawsoc/public/saleproducts.jsf?catId=EBOOK",
              ]
rules = (
    Rule(SgmlLinkExtractor(allow=('//*[@id="productList:scrollernext"]', )), callback='parse_item', follow= True),
    Rule(SgmlLinkExtractor(allow=('//p/a[contains(@id, "productList")]', )), callback='parse_item', follow= True),
)

def parse_item(self, response):
    sel = Selector(response)
    sites = sel.xpath('//div[@class="dataListDiv"]')
    items = []
    for site in sites:
        item = BooksItem()
        item['title'] = site.xpath('//div/a/h3[@class="saleProductsTitle"]/text()').extract()
        item['link'] = site.xpath('//p/a[contains(@id, "productList")]').extract()
        item['price'] = site.xpath('//*[@class="saleProductsPrice"]/text()').extract()
        item['category'] = site.xpath('//span[contains(@id, "category")]/text()').extract()
        item['authors'] = site.xpath('//span[contains(@id, "author")]/text()').extract()
        item['date'] = site.xpath('//span[contains(@id, "publicationDate")]/text()').extract()
        item['publisher'] = site.xpath('//span[contains(@id, "publisher")]/text()').extract()
        item['isbn'] = site.xpath('//span[contains(@id, "isbn")]/text()').extract()
        items.append(item)
    return items

The items.py code is: items.py代码为:

from scrapy.item import Item, Field

class BookpagesItem(Item):
# define the fields for your item here like:
# name = Field()
title = Field()
link = Field()
price = Field()
category = Field()
authors = Field()
date = Field()
publisher = Field()
isbn = Field()

It means from scrapy.spider import CrawlSpider, Rule isn't correct. 这意味着from scrapy.spider import CrawlSpider, Rule不正确。

Looking at the Scrapy documentation, it should probably be from scrapy.contrib.spiders import CrawlSpider 查看Scrapy文档,它可能应该from scrapy.contrib.spiders import CrawlSpider

Any time you get the NameError - Cannot import name foo error, you're looking at an incorrect import, so you can narrow that down to just your import statements. 每当出现NameError-无法导入name foo错误时,您都在查看不正确的导入,因此可以将其缩小为仅导入语句。 You can look in the library's documentation for the correct location, or the source code itself if that's available. 您可以在库的文档中查找正确的位置,或者在源代码本身(如果有)中查找。

I searched the scrapy documentation and found this: http://doc.scrapy.org/en/0.24/topics/spiders.html#crawlspider 我搜索了草率的文档,发现了这一点: http ://doc.scrapy.org/en/0.24/topics/spiders.html#crawlspider

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM