简体   繁体   中英

Heroku configuration for Ruby on Rails application

I've set a client up with Heroku for their Ruby on Rails application and have had a great deal of trouble over the years with their application not running well regardless of how much money we spend on additional resources, find their documentation highly confusing. I've never been able to understand their specific terminology and documentation. We are constantly getting "H12" errors and "R14" errors etc. The memory usage and dyno loads are constantly spiking. And yet this is a small to medium-sized business without a massive amount of traffic. Wondering if anybody out there who does understand the ins and outs of Heroku can look this configuration over and tell me if it makes sense:

DB_POOL: 10
MALLOC_ARENA_MAX: 2
RAILS_MAX_THREADS: 5
WEB_CONCURRENCY: 4
Ruby 2.7
Rails 6.0
Puma
8 2x web dynos
5 1x worker dynos
$50 Postgres standard 0 database
$15 Memcachier
$10 Rediscloud
...etc addons

Your WEB_CONCURRENCY is too high for your Standard-2x dynos. The recommended default is 2: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#recommended-default-puma-process-and-thread-configuration

This is likely contributing to your R14 errors as higher web concurrency means more memory usage. So you need to either lower your web concurrency (which may mean you also need to increase the # of dynos to compensate) or you need to use bigger dynos.

You already have MALLOC_ARENA_MAX=2 but not sure if you are using jemalloc . You might want to try that too .

Of course, you may also have other memory issues in your app - check out some tips here . I also recommend adding a monitoring tool like AppSignal as it's capable of tracking memory allocations per transaction.

For mitigating H12s:

  1. Ensure you have installed something like the rack-timeout gem, which ensures that a long-running request is dropped at the dyno-level and thus avoids the H12 error (you get a Rack::TimeoutError exception instead). Set the timeout to 15s so that it is well under the 30s for H12 timeout.
  2. Investigate your slow transactions. A monitoring tool is key here, ie New Relic (start with lowest-priced paid plan - free plan does not allow transaction tracing). Here is their blog post on how to trace transactions
  3. When you've identified the problem - fix it!
  • if the bottleneck is external:
    • check for external API limits and throttling
    • add timeouts and make app resilient to slow external responses
  • if the bottleneck is due to the database:
    • optimize slow queries
    • check cache hit rates
    • check for the # of waiting connections and db locks -> if the number of waiting connections is consistently above 0 for X minutes, that indicates you have some long locks that you'll need to investigate. Waiting connections is easiest to track over time with Librato (free plan should do fine)
  • if the bottleneck is other app code:

I want to stress the importance of monitoring tools to help diagnose issues and help determine optimal resource usage. Doing things like figuring out the correct concurrency configs, the correct size and # of dynos to run are virtually impossible without proper monitoring tools. Hopefully you have some already that are covered by your etc add-ons that are not listed, but if you do not, I'll summarize my recommendations and mention a couple other tips:

  • To get more metrics info, ensure you have enabled log-runtime-metrics
  • Also enable Ruby language metrics
  • Add a monitoring tool that can track Ruby memory allocations like AppSignal . Scout APM can do this too but I think their plans capable of this are more expensive (requires Scout Insights feature)
  • Add the lowest-paid version of New Relic . This is my go-to tool for transaction tracing. AppSignal can do this too if you don't want to pay for another tool, but I find it easier with New Relic.
  • Add Librato . It offers some great charts out of the box, including a set of Postgres charts in its own dashboard.
  • Set alerts in your monitoring apps to warn you about things like response times so you can look into them!
  • And of course, make all your changes in staging first AND load test them to see the impacts of your changes before attempting in production!

Update: I also just noticed that you said you are using Standard-0 Postgres, which means it has a 120 connection limit . So if you end up lowering your WEB_CONCURRENCY and increasing the # of dynos, watch out for your total connections to that database. Beyond just the fact that there is a limit, more connections also mean more overhead for your db anyway so if you are close to your connection limit, you are more likely to see db performance suffer. You may want to upgrade to another plan that has a higher connection limit or use pgbouncer as your connection pooler to avoid connection limits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM