简体   繁体   中英

Scaling up a ruby, activerecord, mysql app

I have an app...

The app does a market comparison for a financial product - for a given quote request, it contacts several other sites for their quotes. It then gives the user the results - several quotes for their details.

To manage these requests they get saved to MySQL and then my app kicks in, picking up the pending quotes and farms these out to threads (all same Linux box) to process each site lookup.

I am using JRuby as I had thread/db related issues. Using Java threadpools to control the number of threads. With the current hardware/VPS - it can handle around 200 threads. A lot of the limitations seem to relate to each thread grabbing their own MySQL connection - grabbing the quote details and saving back the results. We want to handle more concurrent threads and so looking for ways to scale up.

Wondering which way to go ...

  1. Bigger hardware...
  2. More machines and use some kind of queueing mechanism (with priorities) to share the load across the machines - so the threads dont touch the db, all the details/responses go via the queue - so the DB hit is less, but then maybe I am just pushing the problem into the queue. Thinking of using something like MongoDB for the queue, but open to suggestions - something easy to use with Ruby :)
  3. Some kind of remote/RPC mechanism, eg dRb - theoretically this seems like a good option, but not done anything with this yet to know how complex it will make things.
  4. Something else...?

From this link Reasons for NOT scaling-up vs. -out? - it would seem this problem is suited to running more machines to solve it.

So, any thoughts on which way to go...

Cheers, Chris

My usual approach to problems like this is to pay very close attention to the database queries you're making and tune them aggressively. Retrieve only what you need, skipping columns that aren't explicitly used, and be very careful about eager loading things you don't need in their entirety.

You'll often find you can get significant speed gains by adding indexes, or strategically de-normalizing certain attributes in your database to avoid ugly, time-consuming JOIN operations.

Further, think about caching: The fastest database call is the one that's never made. It's not hard to leverage in something like Memcached to save the results of a moderately time-consuming record retrieval and if done carefully it's even easy to invalidate and expire this provided you channel your updates through a few methods.

For scheduling workers, a simple first-in, first-out queue can be implemented in Redis to off-load a lot of the processing overhead from MySQL itself. This is usually very simple to add if you follow an example.

A cache like Memcached can handle an extremely high amount of traffic, so whenever possible, cache against this to avoid hitting your database for every last thing.

If you've exhausted these options, it's time for more front-end servers and even more database capacity, but only then.

Queing is easiest thing for you to implement. Use something like this: http://beanstalkd.github.com/beaneater/

Basically you can prepend your methods with async. which will put them into queue and execute them. They queue and workers can be same server or a different one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM