[英]Memoization & caching in Ruby & Ruby on Rails
Given application is looping through many fields Why is Application making multiple SQL calls even if I memoize the object
or 要么
Given application is looping through many items How to prevent application doing expensive calculation on every item
Note: we use Policy View Objects as described in http://www.eq8.eu/blogs/41-policy-objects-in-ruby-on-rails
注意:我们使用http://www.eq8.eu/blogs/41-policy-objects-in-ruby-on-rails中描述的策略视图对象
class WorksController < ApplicationController
def index
@works = Work.all
end
end
<% @works.each do |work| %>
<%= link_to("Delete work", work, method: delete) if work.policy.able_to_delete?(current_user: current_user) %>
<% end %>
class Work < ActiveRecord::Base
has_many :comments
def policy
@policy ||= WorkPolicy.new
end
end
class Comment
belongs_to :work
end
class WorkPolicy
attr_reader :work
def initialize(work)
@work = work
end
def able_to_delete?(current_user: nil)
work_has_no_comments || (current_user && current_user.admin?)
end
private
def work_has_no_comments
work.comments.count < 1
end
end
Now let say we have 100 Works in DB 现在假设我们在DB中有100个Works
This would result in multiple SQL calls: 这将导致多个SQL调用:
SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 1]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 2]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 3]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 4]
Note: recently I was explaining this example to a colleague, I think it's worth it to be documented for more developers
注意:最近我向同事解释了这个例子,我认为值得为更多开发人员记录
First let's answer the 首先让我们回答一下
Why is Application making multiple SQL calls even if I memoize the object
为什么即使我记住对象, Application 也会进行多次SQL调用
Yes we are memoizing the Policy object with @policy ||= WorkPolicy.new
是的,我们使用
@policy ||= WorkPolicy.new
Policy对象
But we are not memoizing what that objects is calling. 但我们并没有记住那个对象正在调用的内容。 That mean we need to memoize the underlying object method call result.
这意味着我们需要记住底层对象方法调用结果。
So if we did: 所以如果我们这样做:
@work = Work.last
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 100] # sql call
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 100] # sql call
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 100] # sql call
... we would call multiple time the comments.count
...我们会多次调用
comments.count
But if we introduce another layer of memoization: 但是如果我们引入另一层memoization:
So let's change this: 所以让我们改变这个:
class WorkPolicy
# ...
def work_has_no_comments
work.comments.count < 1
end
end
To this: 对此:
class WorkPolicy
# ...
def work_has_no_comments
@work_has_no_comments ||= comments.count < 1
end
end
@work = Work.last
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 100] # sql call
@work.policy.able_to_delete?
@work.policy.able_to_delete?
As you can see the SQL call on count is made only the first time and then result is returned from memory of the object state. 正如您所看到的,仅在第一次对count进行SQL调用,然后从对象状态的内存返回结果。
But our case of "looping through multiple works" this would not work because we are initializing 100 Work objects with 100 WorkPolicy objects 但是我们的“循环多个工作”的情况这不起作用,因为我们用100个WorkPolicy对象初始化100个Work对象
Best way to understand it is by running this code in your irb
: 理解它的最好方法是在你的
irb
运行这段代码:
class Foo
def x
@x ||= calculate
end
private
def calculate
sleep 2 # slow query
123
end
end
class Bar
def y
@y ||= Foo.new
end
end
p "10 times calling same memoized object\n"
bar = Bar.new
10.times do
puts bar.y.x
end
p "10 times initializing new object\n"
10.times do
bar = Bar.new
puts bar.y.x
end
One way to deal with this is to use Rails cache 解决此问题的一种方法是使用Rails缓存
class WorkPolicy
# ...
def work_has_no_comments
Rails.cache.fetch [WorkPolicy, 'work_has_no_comments', @work] do
work.comments.count < 1
end
end
end
class Comment
belongs_to :work, touch: true # `touch: true` will update the Work#updated_at each time new commend is added/changed, so that we drop the cache
end
Now this is just stupid example.
现在这只是一个愚蠢的例子。 I know this should be probably cached this by introducing on
Work#comments_count
method and do the cache the count of comments in there.我知道这可能是通过引入
Work#comments_count
方法来缓存的,并在那里缓存注释的数量。 I just want to to demonstrate the options.我只是想展示一些选择。
With caching like this in place, first time we run the WorksController#index
we would get multiple SQL calls : 通过这样的缓存,我们第一次运行
WorksController#index
会得到多个SQL调用:
SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 1]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 2]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 3]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 4]
# ...
...but second, third, call would look like: ...但第二,第三,电话看起来像:
SELECT "works".* FROM "works"
# no count call
And if you add a new comment to the Work with id 3
: 如果您向ID为
3
的工作添加新评论:
SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 3]
Now we are still not satisfied. 现在我们仍然不满意。 We want that first run to be fast !
我们希望第一次跑得快! Problem is our way of how we are calling our associations (Comments).
问题是我们如何称呼我们的协会(评论)。 We are Lazy loading them:
我们懒惰加载它们:
Work.limit(3).each {|w| w.comments }
# => SELECT "works".* FROM "works" WHERE ORDER BY "works"."id" DESC LIMIT 10
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1 ORDER BY comments.created_at ASC [["work_id", 97]]
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1 ORDER BY comments.created_at ASC [["work_id", 98]]
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1 ORDER BY comments.created_at ASC [["work_id", 99]]
But if we eager load them: 但是如果我们急于加载它们:
Work.limit(3).includes(:comments).map(&:comments)
SELECT "works".* FROM "works" WHERE "works"."deleted_at" IS NULL LIMIT 3
SELECT "comments".* FROM "comments" WHERE "comments"."status" = 'approved' AND "comments"."work_id" IN (97, 98, 99) ORDER BY comments.created_at ASC
Read more about
includes
,joins
in http://blog.scoutapp.com/articles/2017/01/24/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where阅读更多关于
includes
,joins
http://blog.scoutapp.com/articles/2017/01/24/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where
So our code could be: 所以我们的代码可能是:
class WorksController < ApplicationController
def index
@works = Work.all.includes(:comments)
end
end
class WorkPolicy
# ...
def work_has_no_comments
work.comments.size < 1 # we changed `count` to `size`
end
end
Q: Now wait a minute, isn't comments.count
and commets.size
the same ? 问: 现在等一下,是不是
comments.count
和commets.size
一样?
Not really 并不是的
10.times do
work.comments.size
end
# SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1 ORDER BY comments.created_at ASC [["work_id", 1]]
... loads all the comments to (something like) Array and does array calculation of the size (as if [].size) ...将所有注释加载到(类似于)Array并进行大小的数组计算(如同[] .size)
10.times do
work.comments.count
end
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 1]]
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 1]]
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1 [["work_id", 1]]
# ...
...executes SELECT COUNT
which is much faster than loading "all comments" to calculate the size, but then when you need to execute this 10 times you are explicitly making 10 calls ...执行
SELECT COUNT
比加载“所有注释”来计算大小要快得多,但是当你需要执行10次时,你明确地进行了10次调用
Now I'm overexaturating with
work.comments.size
Rails is more clever in determining if you just want just thesize
.现在我正在使用work.comments.size来
work.comments.size
Rails在确定你是否只想要size
更聪明。 In some cases it just executesSELECT COUNT(*)
instead of "load all comments to array" and do [].size在某些情况下,它只执行
SELECT COUNT(*)
而不是“将所有注释加载到数组”并执行[] .size
It's simmilar like .pluck
vs .map
它类似于
.pluck
vs .map
scope = Work.limit(10)
scope.pluck(:title)
# SELECT "works"."title" FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.pluck(:title)
# SELECT "works"."title" FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.map(&:title)
# SELECT "works".* FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.map(&:title)
# => ['foo', 'bar', ...]
pluck
is faster as it only selects the title
to array, but executes SQL call every time pluck
更快,因为它只选择数组的title
,但每次都执行SQL调用 map
will cause Rails to evaluate the SELECT *
in order to populate title
to array, but then you can work with loaded objects map
将导致Rails评估SELECT *
以便将title
填充到数组,但随后您可以使用加载的对象 There is no silver bullet. 没有银弹。 It always depends on what you want to achive.
它总是取决于你想要实现的目标。
One may argue that the "optimize SQL" solution works the best, but that's not true. 有人可能会说“优化SQL”解决方案效果最好,但事实并非如此。 You need to implement similar SQL optimization in every place where you are calling
work.policy.able_to_delete
which may be 10 or 100 places. 您需要在调用
work.policy.able_to_delete
每个位置实现类似的SQL优化,这可能是10或100个位置。 includes
may not be always good idea in terms of performance. includes
在表现方面可能并不总是好主意。
Cache can get supper chained in terms of what event should drop what part of the cache. 缓存可以根据什么事件应该删除缓存的哪个部分来获得超级链接。 If you don't do it properly your website may be displaying "out of date information" !
如果您没有正确地做到这一点,您的网站可能会显示“过时信息”! In case of policy objects that is super dangerous.
如果策略对象超级危险。
Memoization is not always flexible enough as you may need to redesign large part of code base to achieve it and introduce too many layers of unnecessary abstraction Memoization并不总是足够灵活,因为您可能需要重新设计大部分代码库来实现它并引入太多不必要的抽象层
Not to mention that memoization is big No No in thread safe enviroments like Rubinius unless you sync your threads correctly.
更不用说在Rubinius这样的线程安全环境中,memoization是很大的No No,除非你正确地同步你的线程。 Don't worry you are fine with memoization (in 95% cases) if you use MRI, Rails & Puma are Thread safe but that's different kind of "thread safe".
如果你使用MRI,Rails和Puma是线程安全的,那么不要担心你的记忆是好的(在95%的情况下),但这是不同类型的“线程安全”。 You really need to do something stuppid for that to be an issue.
你真的需要做一些难以解决的问题。 This article is way too long to go into that topic.
这篇文章太长了,无法进入该主题。 Google it!
谷歌一下!
Really depends what your application (part of application) is aims for. 真的取决于您的应用程序(应用程序的一部分)的目标。 My only recommendation is: Profile/benchmark your app !
我唯一的建议是:个人资料/基准你的应用! Don't prematurely optimize.
不要过早优化。 Use tools like New relic to discover what parts of your app are slow.
使用New relic等工具来发现应用程序的哪些部分很慢。
Optimize gradually, don't build slow application and then In one sprint you will decide "Right, lets optimize" because you may find out that you made poor design choices and 50% of your App needs rewrite to be faster. 逐步优化,不要构建缓慢的应用程序然后在一个sprint中你将决定“正确,让我们优化”,因为你可能会发现你做出了糟糕的设计选择,50%的App需要重写更快。
Counter Cache 计数器缓存
Database indexes 数据库索引
May sound of topic but lot of performance issues happens because your app has no DB indexes (or too many premature indexes) 可能会发出主题但很多性能问题都会发生,因为您的应用没有数据库索引(或过多的过早索引)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.