简体   繁体   中英

How can i write a never ending job in Rails (Web Scraping)?

Goal : I want to make a web scraper in a Rails app that runs indefinitely and can be scaled.

Current stack app is running on: ROR/Heroku/Redis/Postgres

Idea : I was thinking of running a Sidekiq Job that runs every n minutes and checks if there are any proxies available to scrape with (these will be stored in a table with status sleeping/scraping).

Assuming there is a proxy available to scrape it will then check (using Sidekiq API ) if there is any available workers to start up another job to scrape with the available proxy.

This means i could scale the scraper by increasing number of workers and the number of available proxies. If for any reason the Job fails the Job that looks for available proxies will just start it again.

Questions : Is this the best solution for my goal? Is utilizing long running Sidekiq jobs the best idea or could this blow up?

Sidekiq is designed to run individual jobs which are "units of work" to your organization.

You can build your own loop and, inside that loop, create jobs for each page to scrape but the loop itself should not be a job.

If you want a job to run every n minutes, you could schedule it.

And since you're using Heroku, there is an Add-on that : https://devcenter.heroku.com/articles/scheduler

Another solution would be to set cron jobs and schedule them with the whenever gem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM