類實例內的空變量，盡管對其進行了專門設置

Question

當我運行以下代碼時：

import scrapy
from scrapy.crawler import CrawlerProcess

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    search_url = ''

    def start_requests(self):
        print ('self.search_url is currently: ' + self.search_url)
        yield scrapy.Request(url=self.search_url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

test_spider = QuotesSpider()
test_spider.search_url='http://quotes.toscrape.com/page/1/'

process.crawl(test_spider)
process.start() # the script will block here until the crawling is finished

我收到以下錯誤：

self.search_url is currently:
...
   ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url:
...

似乎在函數start_requests中，self.search_url似乎是一個空變量，即使我在調用函數之前已將其值顯式設置為某個值。 我似乎無法弄清楚為什么。

Answer 1

最簡單的方法是使用構造函數__init__() ，但是更簡單的方法（可能是更快的方法）是將start_url的定義移到類中。 例如：

import scrapy
from scrapy.crawler import CrawlerProcess

class QuotesSpider(scrapy.Spider):

    name = "quotes"
    search_url = 'http://quotes.toscrape.com/page/1/'

    def start_requests(self):
        print ('search_url is currently: ' + self.search_url)
        yield scrapy.Request(url=self.search_url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

test_spider = QuotesSpider()

process.crawl(test_spider)
process.start()

類實例內的空變量，盡管對其進行了專門設置

問題描述

1 個解決方案

解決方案1
1 2019-06-10 14:01:06

類實例內的空變量，盡管對其進行了專門設置

問題描述

1 個解決方案

解決方案1 1 2019-06-10 14:01:06

解決方案1
1 2019-06-10 14:01:06