简体   繁体   中英

Scale web scraping site with node.js

I'm developing a web scraping website to find available delivery restaurants. The website searches on the most popular delivery portals and shows the result aggregated in a single page.

The site is hosted on Heroku with 4 dynos.

http://deliveria.net/#05409-002

When a user makes a request on the website, it makes around 30 HTTP requests to retrieve the result.

The problem is the performance, the requests aren't fast and each search can make 30 of them, locking the app while the search is being performed for a single user.

I tried to increase Heroku dynos:

 heroku scale web=10

And I didn't feel a perceptible gain.

What is the best approach to scale this kind of application?

(I can't use cache, as the searches need to be in real time)

Current stack:

  • Heroku
  • Node.js
  • express
  • request module
  • EJS
  • Pusher
  • Redis

The important thing here is to have workers, because you must avoid blocking the event loop in your main app.

Try to delegate the 30 http requests between the available workers. Maybe Kue can help you with this aspect (you push new jobs to the queue and they get executed one by one by the workers). So for example if you have 10 dynos on Heroku, use 9 for workers (that make those 30 http searches).

From the user's point of view it's important to know that the application is reacting fast to his search (and doesn't give him the 'freeze' impression), so maybe you would like to update him as soon as you have preliminary results (for example 10 pages get searched out of 30). You could do that via WebSockets ( Socket.IO ) and even show a nice graphical progress bar or something similar.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM