简体   繁体   English

如何阻止蜘蛛进入管道?

[英]How to stop a spider from pipeline?

@Sjaak Trekhaak has a 'hack' here How do I stop all spiders and the engine immediately after a condition in a pipeline is met? @Sjaak Trekhaak在这里有一个“ hack”,在满足管道中的条件后,如何立即停止所有蜘蛛和引擎? that can potentially stop the spiders by setting a flag in pipeline, and then call CloseSpider in the parser method. 通过在管道中设置一个标志,然后在解析器方法中调用CloseSpider,可以潜在地阻止蜘蛛。 However I have the following code in pipeline (where pdate and lastseen are well defined datetime): 但是我在管道中有以下代码(其中pdate和lastseen是定义良好的日期时间):

class StopSpiderPipeline(object):
    def process_item(self, item, spider):                                       
        if pdate < lastseen:
            spider.close_down = True 

and in spider 和蜘蛛

def parse_item(self, response):                                             
    if self.close_down:                                                     
        raise CloseSpider(reason='Already scraped')     

I got error exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down' , where did I get wrong? 我收到错误exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down' ,我在哪里弄错了? the question was actually asked by @anicake but was not responded. 该问题实际上是@anicake提出的,但未得到答复。 Thanks, 谢谢,

Is your spider's close_down attribute create? 蜘蛛的close_down属性是否已创建? Because it looks like it doesn't. 因为看起来好像没有。

Try changing your check to if "close_down" in self.__dict__: or adding self.close_down = False in your spider's __init__() method. 尝试将检查更改为if "close_down" in self.__dict__:是否为if "close_down" in self.__dict__:或在蜘蛛的__init__()方法中添加self.close_down = False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM