简体繁体 English

当许多工人时，延迟工作运行缓慢

[英]delayed_job running slow when many workers

原文 2012-01-16 03:11:02 2 1 ruby-on-rails/ multithreading/ virtual-machine/ delayed-job/ cpu-usage

Our app has a search task which take <30 sec to run. 我们的应用程序有一个搜索任务，运行时间不到30秒。 We moved the task to background using delayed_job, it works great. 我们使用delay_job将任务移至后台，效果很好。 To handle more search request, we open 60 delayed_job workers, the problem comes when more workers working at the same time. 为了处理更多的搜索请求，我们打开了60个delayed_job工人，当更多的工人同时工作时，就会出现问题。

If I send one request to server, it takes ~30 sec to finish the job; 如果我向服务器发送一个请求，则大约需要30秒才能完成工作； Then I try to send 10 requests to the server, each jobs take >3 mins to finish... And if I try to send 30 request to the server at the same time, each jobs take 26 mins to finish..........my god... 然后，我尝试向服务器发送10个请求，每个作业需要3分钟以上才能完成...如果我尝试同时向服务器发送30个请求，则每个作业需要26分钟才能完成..... .....天哪...

Our search task can split into 2 part. 我们的搜索任务可以分为两部分。 First, sending out 10-20 API requests to 3rd party server using threading (http://www.tutorialspoint.com/ruby/ruby_multithreading.htm), and wait for response, it takes around 15 sec to finish. 首先，使用线程（http://www.tutorialspoint.com/ruby/ruby_multithreading.htm）向第三方服务器发送10-20个API请求，然后等待响应，大约需要15秒。 Second, we process the response data, searching local mySQL DB, do some loop and calculation, and at the end save the result into the file system (the file location is a shared space using NFS), it takes around 10 sec to finish. 其次，我们处理响应数据，搜索本地的mySQL DB，进行一些循环和计算，最后将结果保存到文件系统（文件位置是使用NFS的共享空间），大约需要10秒钟才能完成。

I use Linux 'top' command, found that when 1 job is running, it takes 100% cpu power sometimes. 我使用Linux的“ top”命令，发现运行1个作业时，有时需要100％的cpu功耗。 When I run 30 jobs at the same time, each jobs take <10% cpu power, I guess this is the reason why it takes 26 mins for each jobs... 当我同时运行30个作业时，每个作业占用<10％的cpu功率，我想这就是每个作业需要26分钟的原因...

Currently I have no idea how to improve the speed, to make it supports more users and the speed is ~30 sec... 目前我不知道如何提高速度，使其支持更多的用户，速度约为30秒...

We are using Rails 3.0.x, Ruby 1.9.2p290 (real threading?), a server running 4 VMs (DB, Ngnix, Ruby/Unicorn, Ruby/delayed_job). 我们使用的是Rails 3.0.x，Ruby 1.9.2p290（真正的线程？），这是一台运行4个VM（DB，Ngnix，Ruby / Unicorn，Ruby / delayed_job）的服务器。

What in my mind now is: - real threading (how to test if we are?) - jRuby (it helps in this case?) - Network IO (server admin said not likely) - File System/NFS IO (server admin said not likely) 我现在的想法是：-真正的线程（如何测试我们是否是？）-jRuby（在这种情况下有帮助吗？）-网络IO（服务器管理员说不太可能）-文件系统/ NFS IO（服务器管理员说没有）可能）

Anyone have similar experience can give me some idea, so I can dig in to the problem? 任何有类似经验的人都可以给我一些想法，以便我深入探讨问题所在？ Many Thanks! 非常感谢！

1 个解决方案

New Relic can give you a sense of where your jobs are spending their time. New Relic可以让您了解工作在哪里花费时间。 You can set it up to monitor your jobs and record a detailed trace of each one. 您可以对其进行设置以监视您的作业并记录每个作业的详细跟踪。 There's a 14-day free trial that includes the detailed trace feature ("Transaction Traces"). 有14天的免费试用期，其中包括详细的跟踪功能（“事务跟踪”）。

The bottleneck could be in any of the areas you mention. 瓶颈可能在您提到的任何领域。 If the DB is your bottleneck, you can tune your queries, possibly by adding indices. 如果数据库是您的瓶颈，则可以通过添加索引来优化查询。 If your web requests are not really executing in parallel (not sure what your code looks like), you could use something like typhoeus to handle all the parallel business for you. 如果您的Web请求不是真正并行执行（不确定代码的样子），则可以使用typhoeus之类的东西为您处理所有并行业务。

Savon is processing XML from the SOAP requests, so make sure you're using a faster XML library like libxml or nokogiri. Savon正在处理来自SOAP请求的XML，因此请确保您使用的是更快的XML库，例如libxml或nokogiri。