简体   繁体   English

Scrapy:无法安排

[英]Scrapy : not able to schedule

I want to run a spider every couple of minutes.我想每隔几分钟运行一个蜘蛛。 I put the following script in my project that I want to call for this purpose.我将以下脚本放在我想为此目的调用的项目中。

import schedule, os
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

def job():
      process = CrawlerProcess(get_project_settings())
      process.crawl('amazon_spider')
      process.start() # error: twisted.internet.error.ReactorNotRestartable 
      #process.start(stop_after_crawl=False)  #process get stuck

while True:
     schedule.run_pending()
     schedule.every().minutes.do(job)

With this appoach the process get the following error:使用此方法,过程会出现以下错误:

twisted.internet.error.ReactorNotRestartable or stuck if I put process.start(stop_after_crawl=False)如果我放置process.start(stop_after_crawl=False), twisted.internet.error.ReactorNotRestartable或卡住

From a previous stackoverflow posted I also try this :从之前发布的 stackoverflow 中,我也尝试了这个:

from twisted.internet import reactor
from amazon.spiders.amazon_spider import AmazonSpider
from scrapy.crawler import CrawlerRunner

def run_crawl():

    runner = CrawlerRunner({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    })
     deferred = runner.crawl(AmazonSpider)
     deferred.addCallback(reactor.callLater, 10, run_crawl)

     return deferred

     run_crawl()
     reactor.run()   

The process get stuck again in the middle of the parse function .该过程再次卡在解析函数的中间。 I really don't know what to try next .我真的不知道接下来要尝试什么。 If you have an idea please let me know.如果您有任何想法,请告诉我。 Thank you in advance .... ( By the way , it is not a duplicate , since the posts on the same subject didn#t solve my problem.提前谢谢你....(顺便说一句,它不是重复的,因为关于同一主题的帖子没有解决我的问题。

I use apscheduler我使用apscheduler

pip install apscheduler

then然后

# -*- coding: utf-8 -*-
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from apscheduler.schedulers.twisted import TwistedScheduler

from Demo.spiders.baidu import YourSpider

process = CrawlerProcess(get_project_settings())
scheduler = TwistedScheduler()
scheduler.add_job(process.crawl, 'interval', args=[YourSpider], seconds=10)
scheduler.start()
process.start(False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM