簡體   English   中英

Scrapy 帶 CrawlerProcess 的無限循環

[英]Scrapy Infinite loop with CrawlerProcess

我目前正在運行 Scrapy v2.5,我想運行無限循環。 我的代碼:

class main():

    def bucle(self, array_spyder, process):
        mongo       = mongodb(setting)
        for spider_name in array_spider:
            process_init.crawl(spider_name, params={ "mongo": mongo, "spider_name": spider_name})
        process.start()
        process.stop()
        mongo.close_mongo()

if __name__ == "__main__":
    setting     = get_project_settings()
    while True:
        process = CrawlerProcess(setting)
        array_spider = process.spider_loader.list()
        class_main = main()
        class_main.bucle(array_spider, process)

但這導致錯誤消息如下:

Traceback (most recent call last):
  File "run_scrapy.py", line 92, in <module>
    process.start()
  File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py", line 327, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1422, in run
    self.startRunning(installSignalHandlers=installSignalHandlers)
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1404, in startRunning
    ReactorBase.startRunning(cast(ReactorBase, self))
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 843, in startRunning
    raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

誰能幫我??

AFAIK 沒有簡單的方法可以重新啟動蜘蛛,但有一種替代方法 - 永遠不會關閉的蜘蛛。 為此,您可以利用spider_idle信號。

根據文檔:

Sent when a spider has gone idle, which means the spider has no further:  
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline

您還可以在官方 文檔中找到使用Signals的示例。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM