简体   繁体   English

抓狂地管理多个蜘蛛

[英]Managing multiple spiders with scrapy

I am creating an aggregator and I started with scrapy as my initial tool set. 我正在创建一个聚合器,并以scrapy作为初始工具集开始。 First I only had a few spiders, but as the project grows it seems like I may have hundreds or even a thousand different spiders as i scrape more and more sites. 首先,我只有几个蜘蛛,但是随着项目的发展,随着我抓取越来越多的网站,似乎我可能拥有数百甚至上千种蜘蛛。 What is the best way to manage these spiders as some websites only need to be crawled once, some on a more regular basis? 什么是管理这些蜘蛛的最佳方法,因为某些网站只需要爬网一次,而有些则需要定期爬网? Is scrapy still a good tool when dealing with so many sites or would you recommend some other technology. 当处理如此多的网站时,抓痒仍然是一个很好的工具,或者您会推荐其他技术。

You can check out the project scrapely , that is from the creators of scrapy. 您可以从scrapy的创建者那里仔细地签出项目。 But, as far as I know, it is not suitable for parsing sites containing javascript (more precisely, if the parsed data is not generated by javascript). 但是,据我所知,它不适合解析包含javascript的网站(更确切地说,如果解析的数据不是由javascript生成的)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM