简体   繁体   中英

Schedule scrapy spiders to run every N minutes

I need help to schedule my spiders runs every N minutes. Early I see most people used reactor.callLater and reactor.run for this purpose but it seems there is no possibilities to use these functionalities now. How can I schedule it programmaticaly?

def run_crawl():
    """
    Run a spider within Twisted. Once it completes,
    wait 5 seconds and run another spider.
    """
    runner = CrawlerRunner(get_project_settings())
    runner.crawl(SpiderA)
    runner.crawl(SpiderB)
    deferred = runner.join()
    deferred.addCallback(reactor.callLater, 5, run_crawl)
    return deferred

run_crawl()
reactor.run()

At this moment my crawler scheduled by Windows task scheduler, but I want to schedule it programmatically.

You could give a try with an external module named schedule :
schedule github link

Tell me if it doesn't fit your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM