简体   繁体   中英

Running more than one spiders one by one

I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrape web pages and save them to database. I have one spider per webpage. But I am having trouble to run those spiders at once such that a spider starts to crawl exactly after another spiders finishes crawling. How can that be achieved? Is scrapyd the solution?

scrapyd 确实是一个好方法,可以使用max_procmax_proc_per_cpu配置来限制并行 spdiers 的数量,然后您将使用 scrapyd rest api调度蜘蛛,例如:

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM