[英]How restart Scrapy spider
我需要的:
我试试这个:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from time import sleep
while True:
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()
sleep(60)
但得到错误:
twisted.internet.error.ReactorNotRestartable
请帮我做对
Python 3.6
Scrapy 1.3.2
Linux的
我想我找到了解决方案:
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from twisted.internet import task
timeout = 60
def run_spider():
l.stop()
runner = CrawlerRunner(get_project_settings())
d = runner.crawl('spider_name')
d.addBoth(lambda _: l.start(timeout, False))
l = task.LoopingCall(run_spider)
l.start(timeout)
reactor.run()
为了避免ReactorNotRestartable错误,您可以尝试使用subprocesses
进程创建一个main.py文件,从哪里多次调用来自shell的爬虫。
这个main.py文件可能是这样的:
from time import sleep
import subprocess
timeout = 60
while True:
command = 'scrapy crawl yourSpiderName'
subprocess.run(command, shell=True)
sleep(timeout)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.