简体   繁体   English

AttributeError: 'Spider' object 在 scarpy spider 中没有属性 'crawler'

[英]AttributeError: 'Spider' object has no attribute 'crawler' in scarpy spider

In order to access settings from _init_ i had to add from_crawler @classmethod .为了从_init_访问设置,我必须添加from_crawler @classmethod Now it appears that some functionality of scrapy framework was lost.现在看来 scrapy 框架的一些功能丢失了。 i am getting AttributeError: 'Code1Spider' object has no attribute 'crawler' when url fails and spider tries to retry request.我收到AttributeError: 'Code1Spider' object has no attribute 'crawler'当 url 失败并且蜘蛛尝试重试请求时。 Scrapy version is 2.0.1. Scrapy 版本为 2.0.1。 Spider is running on Zyte cloud. Spider 在 Zyte 云上运行。

What did i do wrong and how to fix it?我做错了什么以及如何解决?

Here is my spider code:这是我的蜘蛛代码:

class Code1Spider(scrapy.Spider):
    name = 'cointelegraph_pr'
    allowed_domains = ['cointelegraph.com']
    start_urls = ['https://cointelegraph.com/press-releases']
    
    
    def __init__(self, settings):
        #Returns settings values as dict
        settings=settings.copy_to_dict()
        
        self.id = int(str(datetime.now().timestamp()).split('.')[0])          
        self.gs_id = settings.get('GS_ID')
        self.endpoint_url = settings.get('ENDPOINT_URL')
        self.zyte_api_key = settings.get('ZYTE_API_KEY')        
        self.zyte_project_id = settings.get('ZYTE_PROJECT_ID')        
        self.zyte_collection_name = self.name
        
        #Loads a list of stop words from predefined google sheet
        self.denied = load_gsheet(self.gs_id)
        #Loads all scraped urls from previous runs from zyte collections
        self.scraped_urls = load_from_collection(self.zyte_project_id, self.zyte_collection_name, self.zyte_api_key)
        logging.info("###############################")
        logging.info("Number of previously scraped URLs = {}.".format(len(self.scraped_urls)))
        logging.info("")
        
        
    # We need this to pass settings into init. Otherwise settings will be accessible only after init function.
    # As per https://docs.scrapy.org/en/1.8/topics/settings.html#how-to-access-settings
    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        return cls(settings)

Here is the error:这是错误:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 42, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1362, in returnValue
    raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <504 https://cointelegraph.com/press-releases/the-launch-of-santa-browser-to-bring-in-the-next-200m-users-onto-web30>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python3.8/site-packages/scrapy/core/downloader/middleware.py", line 51, in process_response
    response = yield deferred_from_coro(method(request=request, response=response, spider=spider))
  File "/usr/local/lib/python3.8/site-packages/scrapy/downloadermiddlewares/retry.py", line 53, in process_response
    return self._retry(request, reason, spider) or response
  File "/usr/local/lib/python3.8/site-packages/scrapy/downloadermiddlewares/retry.py", line 69, in _retry
    stats = spider.crawler.stats
AttributeError: 'Code1Spider' object has no attribute 'crawler'

Everything else is scrapy default spider.其他一切都是 scrapy 默认蜘蛛。 No modifications to settings or middleware.不修改设置或中间件。 What did i do wrong and how to fix it?我做错了什么以及如何解决?

That is because you are overwriting the from_crawler method without assigning the crawler to the spider.那是因为您正在覆盖from_crawler方法,而没有将crawler分配给蜘蛛。

change your from_crawler method to the following:将您的from_crawler方法更改为以下内容:

@classmethod
def from_crawler(cls, crawler):
    settings = crawler.settings
    spider = cls(settings)
    spider._set_crawler(crawler)
    return spider

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Web Crawler错误:“AttributeError:Spider实例没有属性&#39;find&#39;” - Web Crawler error: “AttributeError: Spider instance has no attribute 'find'” AttributeError:“ Spider”对象没有属性“ table” - AttributeError: 'Spider' object has no attribute 'table' AttributeError: 'module' object 没有属性 'Spider' - AttributeError: 'module' object has no attribute 'Spider' Scrapy splash^ AttributeError: &#39;module&#39; 对象没有属性 &#39;Spider&#39; - Scrapy splash^ AttributeError: 'module' object has no attribute 'Spider' class InstagramSpider(scrapy.Spider):AttributeError:'module'对象没有属性'Spider' - class InstagramSpider(scrapy.Spider): AttributeError: 'module' object has no attribute 'Spider' 无法在云上部署 scrapy 蜘蛛(zyte):AttributeError:'ScrapyArgumentParser' object 没有属性'add_option' - cannot deploy scrapy spider on the cloud(zyte) : AttributeError: 'ScrapyArgumentParser' object has no attribute 'add_option' Scrapy AttributeError:“ SoundseasySpider”对象没有属性“ crawler” - Scrapy AttributeError: 'SoundseasySpider' object has no attribute 'crawler' 链接检查器(蜘蛛爬行器) - Link Checker (Spider Crawler) 为大学体育课编写脚本,不断收到错误 &#39;AttributeError: module &#39;scrapy&#39; has no attribute &#39;spider&#39;&#39; - Writing script for college sports class, keep getting error 'AttributeError: module 'scrapy' has no attribute 'spider'' 爬虫对象与蜘蛛对象和管道对象是什么关系? - What is the relationship between the crawler object with spider and pipeline objects?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM