如何阻止蜘蛛进入管道？

Question

@Sjaak Trekhaak has a 'hack' here How do I stop all spiders and the engine immediately after a condition in a pipeline is met? @Sjaak Trekhaak在这里有一个“ hack”，在满足管道中的条件后，如何立即停止所有蜘蛛和引擎？ that can potentially stop the spiders by setting a flag in pipeline, and then call CloseSpider in the parser method. 通过在管道中设置一个标志，然后在解析器方法中调用CloseSpider，可以潜在地阻止蜘蛛。 However I have the following code in pipeline (where pdate and lastseen are well defined datetime): 但是我在管道中有以下代码（其中pdate和lastseen是定义良好的日期时间）：

class StopSpiderPipeline(object):
    def process_item(self, item, spider):                                       
        if pdate < lastseen:
            spider.close_down = True

and in spider 和蜘蛛

def parse_item(self, response):                                             
    if self.close_down:                                                     
        raise CloseSpider(reason='Already scraped')

I got error exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down' , where did I get wrong? 我收到错误exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down' ，我在哪里弄错了？ the question was actually asked by @anicake but was not responded. 该问题实际上是@anicake提出的，但未得到答复。 Thanks, 谢谢，

Answer 1

Is your spider's close_down attribute create? 蜘蛛的close_down属性是否已创建？ Because it looks like it doesn't. 因为看起来好像没有。

Try changing your check to if "close_down" in self.__dict__: or adding self.close_down = False in your spider's __init__() method. 尝试将检查更改为if "close_down" in self.__dict__:是否为if "close_down" in self.__dict__:或在蜘蛛的__init__()方法中添加self.close_down = False 。

如何阻止蜘蛛进入管道？

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-07-15 13:30:38

如何阻止蜘蛛进入管道？

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-07-15 13:30:38

解决方案1
1 已采纳 2013-07-15 13:30:38