Scale web scraping site with node.js

Question

I'm developing a web scraping website to find available delivery restaurants. The website searches on the most popular delivery portals and shows the result aggregated in a single page.

The site is hosted on Heroku with 4 dynos.

http://deliveria.net/#05409-002

When a user makes a request on the website, it makes around 30 HTTP requests to retrieve the result.

The problem is the performance, the requests aren't fast and each search can make 30 of them, locking the app while the search is being performed for a single user.

I tried to increase Heroku dynos:

 heroku scale web=10

And I didn't feel a perceptible gain.

What is the best approach to scale this kind of application?

(I can't use cache, as the searches need to be in real time)

Current stack:

Heroku
Node.js
express
request module
EJS
Pusher
Redis

Answer 1

The important thing here is to have workers, because you must avoid blocking the event loop in your main app.

Try to delegate the 30 http requests between the available workers. Maybe Kue can help you with this aspect (you push new jobs to the queue and they get executed one by one by the workers). So for example if you have 10 dynos on Heroku, use 9 for workers (that make those 30 http searches).

From the user's point of view it's important to know that the application is reacting fast to his search (and doesn't give him the 'freeze' impression), so maybe you would like to update him as soon as you have preliminary results (for example 10 pages get searched out of 30). You could do that via WebSockets ( Socket.IO ) and even show a nice graphical progress bar or something similar.

Scale web scraping site with node.js

Question

1 answers

solution1
1 2011-12-21 23:13:46

Scale web scraping site with node.js

Question

1 answers

solution1 1 2011-12-21 23:13:46

solution1
1 2011-12-21 23:13:46