Running more than one spiders one by one

Question

I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrape web pages and save them to database. I have one spider per webpage. But I am having trouble to run those spiders at once such that a spider starts to crawl exactly after another spiders finishes crawling. How can that be achieved? Is scrapyd the solution?

Answer 1

scrapyd 确实是一个好方法，可以使用max_proc或max_proc_per_cpu配置来限制并行 spdiers 的数量，然后您将使用 scrapyd rest api调度蜘蛛，例如：

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider

Running more than one spiders one by one

Question

1 answers

solution1
1 ACCPTED 2014-02-11 06:17:28

Running more than one spiders one by one

Question

1 answers

solution1 1 ACCPTED 2014-02-11 06:17:28

solution1
1 ACCPTED 2014-02-11 06:17:28