简体繁体 English

扩展ruby，activerecord，mysql应用程序

[英]Scaling up a ruby, activerecord, mysql app

原文 2013-01-08 20:59:42 0 2 mysql/ ruby-on-rails/ ruby/ linux/ rails-activerecord

I have an app... 我有一个应用...

The app does a market comparison for a financial product - for a given quote request, it contacts several other sites for their quotes. 该应用程序对金融产品进行市场比较 - 对于给定的报价请求，它会联系其他几个网站以获取报价。 It then gives the user the results - several quotes for their details. 然后它会向用户提供结果 - 其详细信息有几个引号。

To manage these requests they get saved to MySQL and then my app kicks in, picking up the pending quotes and farms these out to threads (all same Linux box) to process each site lookup. 为了管理这些请求，他们将保存到MySQL，然后我的应用程序启动，获取待处理的引号并将这些引用到线程（所有相同的Linux框）以处理每个站点查找。

I am using JRuby as I had thread/db related issues. 我使用JRuby，因为我有线程/数据库相关的问题。 Using Java threadpools to control the number of threads. 使用Java线程池来控制线程数。 With the current hardware/VPS - it can handle around 200 threads. 使用当前的硬件/ VPS - 它可以处理大约200个线程。 A lot of the limitations seem to relate to each thread grabbing their own MySQL connection - grabbing the quote details and saving back the results. 很多限制似乎与每个线程抓住他们自己的MySQL连接有关 - 抓住报价细节并保存结果。 We want to handle more concurrent threads and so looking for ways to scale up. 我们希望处理更多的并发线程，因此寻找扩展的方法。

Wondering which way to go ... 想知道走哪条路......

Bigger hardware... 更大的硬件......
More machines and use some kind of queueing mechanism (with priorities) to share the load across the machines - so the threads dont touch the db, all the details/responses go via the queue - so the DB hit is less, but then maybe I am just pushing the problem into the queue. 更多的机器并使用某种排队机制（具有优先级）来共享机器上的负载 - 因此线程不接触数据库，所有细节/响应都通过队列进行 - 因此数据库命中率较低，但也许我可能我只是将问题推入队列。 Thinking of using something like MongoDB for the queue, but open to suggestions - something easy to use with Ruby :) 考虑使用类似MongoDB的东西来获取队列，但愿意接受建议 - 易于使用Ruby的东西:)
Some kind of remote/RPC mechanism, eg dRb - theoretically this seems like a good option, but not done anything with this yet to know how complex it will make things. 某种远程/ RPC机制，例如dRb - 理论上这似乎是一个不错的选择，但是没有做任何事情，但还不知道它会有多复杂。
Something else...? 别的......？

From this link Reasons for NOT scaling-up vs. -out? 从这个链接原因是没有扩大规模与-out？ - it would seem this problem is suited to running more machines to solve it. - 看起来这个问题适合运行更多的机器来解决它。

So, any thoughts on which way to go... 所以，任何关于走哪条路的想法......

Cheers, Chris 干杯，克里斯

2 个解决方案

My usual approach to problems like this is to pay very close attention to the database queries you're making and tune them aggressively. 我对此类问题的常用方法是密切关注您正在进行的数据库查询并积极地调整它们。 Retrieve only what you need, skipping columns that aren't explicitly used, and be very careful about eager loading things you don't need in their entirety. 仅检索您需要的内容，跳过未明确使用的列，并且非常谨慎地加载您完全不需要的内容。

You'll often find you can get significant speed gains by adding indexes, or strategically de-normalizing certain attributes in your database to avoid ugly, time-consuming JOIN operations. 您经常会发现，通过添加索引或策略性地对数据库中的某些属性进行反规范化可以获得显着的速度提升，以避免丑陋，耗时的JOIN操作。

Further, think about caching: The fastest database call is the one that's never made. 考虑缓存：最快的数据库调用是从未进行过的。 It's not hard to leverage in something like Memcached to save the results of a moderately time-consuming record retrieval and if done carefully it's even easy to invalidate and expire this provided you channel your updates through a few methods. 利用Memcached之类的东西来保存适度耗时的记录检索结果并不难，如果仔细完成，即使通过几种方法引导您的更新，也很容易使其失效并过期。

For scheduling workers, a simple first-in, first-out queue can be implemented in Redis to off-load a lot of the processing overhead from MySQL itself. 对于调度工作程序，可以在Redis中实现一个简单的先进先出队列，以从MySQL本身卸载大量处理开销。 This is usually very simple to add if you follow an example. 如果您按照示例添加，这通常非常简单。

A cache like Memcached can handle an extremely high amount of traffic, so whenever possible, cache against this to avoid hitting your database for every last thing. 像Memcached这样的缓存可以处理极高的流量，因此只要有可能，就要对其进行缓存，以避免因为最后的事情而访问数据库。

If you've exhausted these options, it's time for more front-end servers and even more database capacity, but only then. 如果您已经用尽了这些选项，那么就需要更多的前端服务器和更多的数据库容量，但只有这样。

Queing is easiest thing for you to implement. 排队是最容易实现的事情。 Use something like this: http://beanstalkd.github.com/beaneater/ 使用这样的东西： http ： //beanstalkd.github.com/beaneater/

Basically you can prepend your methods with async. 基本上，您可以使用async.前置方法async. which will put them into queue and execute them. 这将把它们放入队列并执行它们。 They queue and workers can be same server or a different one. 他们排队，工人可以是同一个服务器或不同的服务器。