简体   繁体   English

如何调整在使用生产级别Heroku Postgres的Heroku上运行的Ruby on Rails应用程序?

[英]How to tune a Ruby on Rails application running on Heroku which uses production level Heroku Postgres?

The Company I work for decided on moving their entire stack to Heroku. 我工作的公司决定将其整个堆栈移至Heroku。 The main motivation was it's ease of use: No sysAdmin, no cry. 其主要动机是易于使用:无需sysAdmin,无需哭泣。 But I still have some questions about it... 但我对此仍有疑问...

I'm making some load and stress tests on both application platform and Postgres service. 我正在对应用程序平台和Postgres服务进行一些负载和压力测试。 I'm using blitz as an addon of Heroku. 我正在使用blitz作为Heroku的插件。 I attack on the site with number of users between 1 to 250. There are some very interesting results I got and I need help on evaluating them. 我攻击该网站的用户数量在1到250之间。我得到了一些非常有趣的结果,我需要帮助评估它们。

The Test Stack: 测试堆栈:

Application specifications 应用规格

It hasn't anything that much special at all. 它没有什么特别的。

  • Rails 4.0.4 Rails 4.0.4
  • Unicorn 独角兽
  • database.yml set up to connect to Heroku postgres. 设置database.yml以连接到Heroku postgres。
  • Not using cache. 不使用缓存。

Database 数据库

It's a Standard Tengu (naming conventions of Heroku will kill me one day :) properly connected to the application. 正确连接到应用程序的是标准Tengu (Heroku的命名约定有一天会杀死我:)。

Heroku configs Heroku配置

I applied everything on unicorn.rb as told in " Deploying Rails Applications With Unicorn " article. 如“ 使用Unicorn部署Rails应用程序 ”一文中所述,我将所有内容都应用到了unicorn.rb I have 2 regular web dynos. 我有2个常规网络测功机。

WEB_CONCURRENCY  : 2
DB_POOL          : 5

Data 数据

  • episodes table counts 100.000~ episodes数达100.000〜
  • episode_urls table counts 300.000~ episode_urls表计数episode_urls
  • episode_images table counts 75.000~ episode_images表计数为episode_images

Code

episodes_controller.rb

  def index
    @episodes = Episode.joins(:program).where(programs: {channel_id: 1}).limit(100).includes(:episode_image, :episode_urls)
  end

episodes/index.html.erb

<% @episodes.each do |t| %>
<% if !t.episode_image.blank? %>
<li><%= image_tag(t.episode_image.image(:thumb)) %></li>
<% end %>
<li><%= t.episode_urls.first.mas_path if !t.episode_urls.first.blank?%></li>
<li><%= t.title %></li>
<% end %>

Scenario #1: 方案1:

Web dynos   : 2
Duration    : 30 seconds
Timeout     : 8000 ms
Start users : 10
End users   : 10

Result: 结果:

HITS 100.00% (484)
ERRORS 0.00% (0)
TIMEOUTS 0.00% (0)

This rush generated 218 successful hits in 30.00 seconds and we transferred 6.04 MB of data in and out of your app. 这次抢注在30.00秒内产生了218次成功命中,我们将6.04 MB的数据传入和传出您的应用程序。 The average hit rate of 7.27/second translates to about 627,840 hits/day. 平均命中率为7.27 /秒,相当于每天约627,840次命中。

Scenario #2: 方案2:

Web dynos   : 2
Duration    : 30 seconds
Timeout     : 8000 ms
Start users : 20
End users   : 20

Result: 结果:

HITS 100.00% (484)
ERRORS 0.00% (0)
TIMEOUTS 0.00% (0)

This rush generated 365 successful hits in 30.00 seconds and we transferred 10.12 MB of data in and out of your app. 这次抢注在30.00秒内成功产生了365次成功匹配,我们将10.12 MB的数据传入和传出您的应用程序。 The average hit rate of 12.17/second translates to about 1,051,200 hits/day. 平均点击率为12.17 /秒,相当于每天点击约1,051,200。 The average response time was 622 ms. 平均响应时间为622 ms。

Scenario #3: 方案3:

Web dynos   : 2
Duration    : 30 seconds
Timeout     : 8000 ms
Start users : 50
End users   : 50

Result: 结果:

HITS 100.00% (484)
ERRORS 0.00% (0)
TIMEOUTS 0.00% (0)

This rush generated 371 successful hits in 30.00 seconds and we transferred 10.29 MB of data in and out of your app. 这次抢购在30.00秒内成功产生了371次成功匹配,我们将10.29 MB的数据传入和传出您的应用程序。 The average hit rate of 12.37/second translates to about 1,068,480 hits/day. 平均命中率为12.37 /秒,相当于每天约1,068,480次命中。 The average response time was 2,631 ms. 平均响应时间为2,631毫秒。

Scenario #4: 方案4:

Web dynos   : 4
Duration    : 30 seconds
Timeout     : 8000 ms
Start users : 50
End users   : 50

Result: 结果:

HITS 100.00% (484)
ERRORS 0.00% (0)
TIMEOUTS 0.00% (0)

This rush generated 484 successful hits in 30.00 seconds and we transferred 13.43 MB of data in and out of your app. 这次抢购在30.00秒内成功产生了484次成功匹配,我们将13.43 MB的数据传入和传出您的应用程序。 The average hit rate of 16.13/second translates to about 1,393,920 hits/day. 平均点击率为16.13 /秒,相当于每天约1,393,920次点击。 The average response time was 1,856 ms. 平均响应时间为1856毫秒。

Scenario #5: 方案5:

Web dynos   : 4
Duration    : 30 seconds
Timeout     : 8000 ms
Start users : 150
End users   : 150

Result: 结果:

HITS 71.22% (386)
ERRORS 0.00% (0)
TIMEOUTS 28.78% (156)

This rush generated 386 successful hits in 30.00 seconds and we transferred 10.76 MB of data in and out of your app. 高峰期在30.00秒内成功产生386次命中,并且我们将10.76 MB的数据传入和传出您的应用程序。 The average hit rate of 12.87/second translates to about 1,111,680 hits/day. 平均命中率为12.87 /秒,相当于每天约1,111,680次命中。 The average response time was 5,446 ms. 平均响应时间为5446毫秒。

Scenario #6: 方案6:

Web dynos   : 10
Duration    : 30 seconds
Timeout     : 8000 ms
Start users : 150
End users   : 150

Result: 结果:

HITS 73.79% (428)
ERRORS 0.17% (1)
TIMEOUTS 26.03% (151)

This rush generated 428 successful hits in 30.00 seconds and we transferred 11.92 MB of data in and out of your app. 这次抢注在30.00秒内产生了428次成功命中,我们将11.92 MB的数据传入和传出您的应用程序。 The average hit rate of 14.27/second translates to about 1,232,640 hits/day. 14.27 /秒的平均命中率相当于每天约1,232,640命中。 The average response time was 4,793 ms. 平均响应时间为4793毫秒。 You've got bigger problems, though: 26.21% of the users during this rush experienced timeouts or errors! 但是,您遇到了更大的问题:在这种高峰期间,有26.21%的用户遇到超时或错误!

General Summary: 概述:

  • The "Hit Rate" never goes beyond the number of 15 even though 150 users sends request to the application. 即使150个用户向应用程序发送请求,“命中率”也不会超过15。
  • Increasing number of web dynos does not help handling requests. Web dynos数量的增加无助于处理请求。

Questions: 问题:

  1. When I use caching and memcached (Memcachier add-on from Heroku) even 2 web dynos can handle >180 hits per second. 当我使用缓存和memcached(来自Heroku的Memcachier插件)时,甚至2个Web dynos每秒都可以处理> 180次点击。 I'm just trying to understand what can dynos and the postgres service can do without cache. 我只是想了解dynos和postgres服务可以在没有缓存的情况下做什么。 This way I'm trying to understand how to tune them. 这样,我试图了解如何调整它们。 How to do it? 怎么做?

  2. Standard Tengu is said to have 200 concurrent connections. 据说标准天狗有200个并发连接。 So why it never reaches that number? 那么为什么它从来没有达到这个数字呢?

  3. If having a prdouction level db and increasing web dynos won't help to scale my app, what's the point to use Heroku? 如果拥有prdbction级别的数据库并增加Web dynos不能帮助扩展我的应用程序,那么使用Heroku有什么意义?

  4. Probably the most important question: What am I doing wrong? 可能是最重要的问题:我做错了什么? :) :)

Thank you for even reading this crazy question! 感谢您阅读这个疯狂的问题!

I particularly figured out the issue. 我特别想出了这个问题。

Firstly, remember my code in the view: 首先,在视图中记住我的代码:

<% @episodes.each do |t| %>
<% if !t.episode_image.blank? %>
<li><%= image_tag(t.episode_image.image(:thumb)) %></li>
<% end %>
<li><%= t.episode_urls.first.mas_path if !t.episode_urls.first.blank?%></li>
<li><%= t.title %></li>
<% end %>

Here I'm getting each episodes episode_image inside my iteration. 在这里,我得到了迭代中的每个情节episode_image集中的episode_image Even though I've been using includes in my controller, there was a big mistake at my table schema. 即使我一直在控制器中使用includes ,我的表架构还是有一个大错误。 I did not have index for episode_id in my episode_images table! 我的episode_images表中没有episode_id索引! . This was causing an extremely high query time. 这导致了极高的查询时间。 I've found it using New Relic's database reports. 我已经使用New Relic的数据库报告找到了它。 All other query times were 0.5ms or 2-3ms but episode.episode_image was causing almost 6500ms! 所有其他查询时间均为0.5毫秒或2-3毫秒,但episode.episode_image导致将近6500毫秒!

I don't know much about the relationship between query time and application execution but as I added index to my episode_images table, now I can clearly see the difference. 我对查询时间和应用程序执行之间的关系了解不多,但是当我将索引添加到episode_images表中时,现在我可以清楚地看到两者之间的区别了。 If you have your database schema properly, you'll probably won't face any problem with scaling via Heroku. 如果您的数据库架构正确,那么通过Heroku进行扩展可能不会遇到任何问题。 But any dyno can not help you with a badly designed database. 但是,任何dyno都无法帮助您设计不良的数据库。

For people who might run into same problem, I would like to tell you about some of my findings of relationship between Heroku web dynos, Unicorn workers and Postgresql active connections: 对于可能遇到相同问题的人,我想告诉您一些有关Heroku网络测功机,Unicorn工人与Postgresql活动连接之间关系的发现:

Basically, Heroku provides you a dyno which is some kind of a small virtual machine having 1 core and 512MB ram. 基本上,Heroku为您提供了dyno,它是一种具有1个核心和512MB ram的小型虚拟机。 Inside that little virtual machine, your Unicorn server runs. 在该小型虚拟机中,您的Unicorn服务器将运行。 Unicorn has a master process and worker processes. 独角兽有一个主流程和工人流程。 Each of your Unicorn workers has their own permanent connection to your existing Postgresql server (Don't forget to check out this ) It basically means that when you have a Heroku dyno up with 3 Unicorn workers running on it, you have at least 4 active connections. 您的每一个独角兽工人有现有的PostgreSQL服务器自己的永久连接(不要忘记检查 )基本上,它意味着,当你有一个Heroku的DYNO了在其上运行3组麒麟的工人,至少有4个活动连接。 If you have 2 web dynos, you have at least 8 active connections. 如果您有2个Web dynos,则至少有8个活动连接。

Let's say you have a Standard Tengu Postgres with 200 concurrent connections limit. 假设您有一个具有200个并发连接限制的标准Tengu Postgres。 If you have problematic queries with bad db design neither can db nor more dynos can save you without cache... If you have long running queries you have no choice other than caching, I think. 如果您的数据库设计不佳而遇到问题查询,那么数据库或更多的dynos都无法在没有缓存的情况下保存您的信息...如果您长时间运行的查询除了缓存之外别无选择。

All above is my own findings, if there is anything wrong with them please warn me by your comments. 以上是我自己的发现,如果有任何问题,请通过您的评论警告我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM