简体   繁体   English

蜘蛛停止抓取或遇到异常后,如何从Scrapy Python脚本退出?

[英]How to exit from a Scrapy Python script after spiders stopped crawling or encountered exception?

I am trying to run my Scrapy's python script from a bat file within a Window's Task Scheduler every minute. 我试图每分钟从Windows任务计划程序中的bat文件运行Scrapy的python脚本。

However the python script somehow did not exit and it is blocking all future task startup from the Task Scheduler. 但是,python脚本以某种方式没有退出,并且阻止了Task Scheduler中所有将来的任务启动。

So, my questions here are, 所以,我的问题是

  1. How can I exit my Scrapy script elegantly after the spiders have completed running? 蜘蛛完成运行后,如何优雅地退出Scrapy脚本?

  2. How can I exit the Scrapy script when encountering Exception, especially ReactorNotRunning Error? 遇到异常,尤其是ReactorNotRunning错误时,如何退出Scrapy脚本?

Thank you All in advance. 谢谢大家。

This is my bat file to run the python script 这是我运行python脚本的bat文件

@echo off
python "C:\Scripts\start.py"
pause

This is my python script 这是我的python脚本

from cineplex.spiders import seatings_spider as seat
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import sys
import time
from twisted.internet import reactor, defer


def crawl_all_showtimes():
    # Create a CrawlerRunner instance to manage multiple spider simultaneously
    runner = CrawlerRunner()

    # Check folder for today
    directory_for_today = utils.create_dir_for_today(PARENT_DIR)

    # Get all cinema id and names first
    cinema_dict = utils.get_all_cinemas()

    # Prepare for crawling
    crawl_showtimes_helper(directory_for_today, cinema_dict, runner)

    # Start Crawling for Showtimes
    reactor.run()


# Helps to run multiple ShowTimesSpiders sequentially
@defer.inlineCallbacks
def crawl_showtimes_helper(output_dir, cinema_dict, runner):
    # Iterate through all cinema to get show timings
    for cinema_id, cinema_name in cinema_dict.iteritems():
        yield runner.crawl(st.ShowTimesSpider, cinema_id=cinema_id,     cinema_name=cinema_name, output_dir=output_dir )
    reactor.stop()

if __name__ == "__main__":

    # Turns on Scrapy Logging
    configure_logging()

    # Collect all Seatings
    crawl_all_seatings()

The main thread of the program blocks for some Scrapy threads. 该程序的主线程会阻塞某些Scrapy线程。 So in your main program use this: 因此,在您的主程序中使用以下命令:

import sys;
sys.exit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM