[英]How to stop a spider from pipeline?
@Sjaak Trekhaak has a 'hack' here How do I stop all spiders and the engine immediately after a condition in a pipeline is met? @Sjaak Trekhaak在这里有一个“ hack”,在满足管道中的条件后,如何立即停止所有蜘蛛和引擎? that can potentially stop the spiders by setting a flag in pipeline, and then call CloseSpider in the parser method.
通过在管道中设置一个标志,然后在解析器方法中调用CloseSpider,可以潜在地阻止蜘蛛。 However I have the following code in pipeline (where pdate and lastseen are well defined datetime):
但是我在管道中有以下代码(其中pdate和lastseen是定义良好的日期时间):
class StopSpiderPipeline(object):
def process_item(self, item, spider):
if pdate < lastseen:
spider.close_down = True
and in spider 和蜘蛛
def parse_item(self, response):
if self.close_down:
raise CloseSpider(reason='Already scraped')
I got error exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down'
, where did I get wrong? 我收到错误
exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down'
,我在哪里弄错了? the question was actually asked by @anicake but was not responded. 该问题实际上是@anicake提出的,但未得到答复。 Thanks,
谢谢,
Is your spider's close_down
attribute create? 蜘蛛的
close_down
属性是否已创建? Because it looks like it doesn't. 因为看起来好像没有。
Try changing your check to if "close_down" in self.__dict__:
or adding self.close_down = False
in your spider's __init__()
method. 尝试将检查更改为
if "close_down" in self.__dict__:
是否为if "close_down" in self.__dict__:
或在蜘蛛的__init__()
方法中添加self.close_down = False
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.