简体   繁体   English

Ruby和Ruby on Rails中的Memoization和缓存

[英]Memoization & caching in Ruby & Ruby on Rails

Given application is looping through many fields Why is Application making multiple SQL calls even if I memoize the object

or 要么

Given application is looping through many items How to prevent application doing expensive calculation on every item

example Rails code 示例Rails代码

  • Work has many comments 工作有很多评论
  • work can be deleted only if there are no comments OR if admin user 只有在没有评论或管理员用户时才能删除工作
  • our view interface will display "delete work" only if can be deleted 只有可以删除时,我们的视图界面才会显示“删除工作”

Note: we use Policy View Objects as described in http://www.eq8.eu/blogs/41-policy-objects-in-ruby-on-rails 注意:我们使用http://www.eq8.eu/blogs/41-policy-objects-in-ruby-on-rails中描述的策略视图对象

class WorksController < ApplicationController
  def index
    @works = Work.all
  end
end

<% @works.each do |work| %>
   <%= link_to("Delete work", work, method: delete) if work.policy.able_to_delete?(current_user: current_user) %>
<% end %>

class Work < ActiveRecord::Base
  has_many :comments

  def policy
     @policy ||= WorkPolicy.new
  end
end

class Comment
  belongs_to :work
end

class WorkPolicy
  attr_reader :work

  def initialize(work)
    @work = work
  end

  def able_to_delete?(current_user: nil)
    work_has_no_comments || (current_user && current_user.admin?)
  end

  private

  def work_has_no_comments
    work.comments.count < 1
  end
end

Now let say we have 100 Works in DB 现在假设我们在DB中有100个Works

This would result in multiple SQL calls: 这将导致多个SQL调用:

SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 2]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 3]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 4]

Note: recently I was explaining this example to a colleague, I think it's worth it to be documented for more developers 注意:最近我向同事解释了这个例子,我认为值得为更多开发人员记录

Memoization 记忆化

First let's answer the 首先让我们回答一下

Why is Application making multiple SQL calls even if I memoize the object 为什么即使我记住对象, Application 也会进行多次SQL调用

Yes we are memoizing the Policy object with @policy ||= WorkPolicy.new 是的,我们使用@policy ||= WorkPolicy.new Policy对象

But we are not memoizing what that objects is calling. 但我们并没有记住那个对象正在调用的内容。 That mean we need to memoize the underlying object method call result. 这意味着我们需要记住底层对象方法调用结果。

So if we did: 所以如果我们这样做:

@work = Work.last
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 

... we would call multiple time the comments.count ...我们会多次调用comments.count

But if we introduce another layer of memoization: 但是如果我们引入另一层memoization:

So let's change this: 所以让我们改变这个:

class WorkPolicy
  # ...

  def work_has_no_comments
    work.comments.count < 1
  end
end

To this: 对此:

class WorkPolicy
  # ...

  def work_has_no_comments
    @work_has_no_comments ||= comments.count < 1
  end
end


@work = Work.last
@work.policy.able_to_delete?
#=> SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 100] # sql call 
@work.policy.able_to_delete?
@work.policy.able_to_delete?

As you can see the SQL call on count is made only the first time and then result is returned from memory of the object state. 正如您所看到的,仅在第一次对count进行SQL调用,然后从对象状态的内存返回结果。

Caching 高速缓存

But our case of "looping through multiple works" this would not work because we are initializing 100 Work objects with 100 WorkPolicy objects 但是我们的“循环多个工作”的情况这不起作用,因为我们用100个WorkPolicy对象初始化100个Work对象

Best way to understand it is by running this code in your irb : 理解它的最好方法是在你的irb运行这段代码:

class Foo
  def x
    @x ||= calculate
  end

  private

  def calculate
      sleep 2 # slow query
      123
  end
end

class Bar
  def y
    @y ||= Foo.new
  end
end

p "10 times calling same memoized object\n"
bar = Bar.new
10.times do
  puts  bar.y.x
end

p "10 times initializing new object\n"

10.times do
  bar = Bar.new
  puts  bar.y.x
end

One way to deal with this is to use Rails cache 解决此问题的一种方法是使用Rails缓存

class WorkPolicy
  # ...

  def work_has_no_comments
    Rails.cache.fetch [WorkPolicy, 'work_has_no_comments', @work] do
      work.comments.count < 1
    end
  end
end

class Comment
  belongs_to :work, touch: true    # `touch: true` will update the Work#updated_at each time new commend is added/changed, so that we drop the cache 
end

Now this is just stupid example. 现在这只是一个愚蠢的例子。 I know this should be probably cached this by introducing on Work#comments_count method and do the cache the count of comments in there. 我知道这可能是通过引入Work#comments_count方法来缓存的,并在那里缓存注释的数量。 I just want to to demonstrate the options. 我只是想展示一些选择。

With caching like this in place, first time we run the WorksController#index we would get multiple SQL calls : 通过这样的缓存,我们第一次运行WorksController#index会得到多个SQL调用:

SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 2]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 3]
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 4]
# ...

...but second, third, call would look like: ...但第二,第三,电话看起来像:

SELECT "works".* FROM "works"
# no count call

And if you add a new comment to the Work with id 3 : 如果您向ID为3的工作添加新评论:

SELECT "works".* FROM "works"
SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 3]

Proper SQL 适当的SQL

Now we are still not satisfied. 现在我们仍然不满意。 We want that first run to be fast ! 我们希望第一次跑得快! Problem is our way of how we are calling our associations (Comments). 问题是我们如何称呼我们的协会(评论)。 We are Lazy loading them: 我们懒惰加载它们:

Work.limit(3).each {|w| w.comments }

# => SELECT  "works".* FROM "works" WHERE  ORDER BY "works"."id" DESC LIMIT 10
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1  ORDER BY comments.created_at ASC  [["work_id", 97]]
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1  ORDER BY comments.created_at ASC  [["work_id", 98]]
# => SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1  ORDER BY comments.created_at ASC  [["work_id", 99]]

But if we eager load them: 但是如果我们急于加载它们:

  Work.limit(3).includes(:comments).map(&:comments)

  SELECT  "works".* FROM "works" WHERE "works"."deleted_at" IS NULL LIMIT 3
  SELECT "comments".* FROM "comments" WHERE "comments"."status" = 'approved' AND "comments"."work_id" IN (97, 98, 99)  ORDER BY comments.created_at ASC

Read more about includes , joins in http://blog.scoutapp.com/articles/2017/01/24/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where 阅读更多关于includesjoins http://blog.scoutapp.com/articles/2017/01/24/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where

So our code could be: 所以我们的代码可能是:

class WorksController < ApplicationController
  def index
    @works = Work.all.includes(:comments)
  end
end

class WorkPolicy
  # ...

  def work_has_no_comments
    work.comments.size < 1        # we changed `count` to `size`
  end
end

Q: Now wait a minute, isn't comments.count and commets.size the same ? 问: 现在等一下,是不是comments.countcommets.size一样?

Not really 并不是的

10.times do
  work.comments.size
end  
# SELECT "comments".* FROM "comments" WHERE "comments"."work_id" = $1    ORDER BY comments.created_at ASC  [["work_id", 1]]

... loads all the comments to (something like) Array and does array calculation of the size (as if [].size) ...将所有注释加载到(类似于)Array并进行大小的数组计算(如同[] .size)

10.times do
  work.comments.count
end
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]]
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]]
# SELECT COUNT(*) FROM "comments" WHERE "comments"."work_id" = $1  [["work_id", 1]]
# ...

...executes SELECT COUNT which is much faster than loading "all comments" to calculate the size, but then when you need to execute this 10 times you are explicitly making 10 calls ...执行SELECT COUNT比加载“所有注释”来计算大小要快得多,但是当你需要执行10次时,你明确地进行了10次调用

Now I'm overexaturating with work.comments.size Rails is more clever in determining if you just want just the size . 现在我正在使用work.comments.size来work.comments.size Rails在确定你是否只想要size更聪明。 In some cases it just executes SELECT COUNT(*) instead of "load all comments to array" and do [].size 在某些情况下,它只执行SELECT COUNT(*)而不是“将所有注释加载到数组”并执行[] .size

It's simmilar like .pluck vs .map 它类似于.pluck vs .map

scope = Work.limit(10)
scope.pluck(:title)
# SELECT  "works"."title" FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.pluck(:title)
# SELECT  "works"."title" FROM "works" LIMIT 10
# => ['foo', 'bar', ...]

scope.map(&:title)
# SELECT  "works".* FROM "works" LIMIT 10
# => ['foo', 'bar', ...]
scope.map(&:title)
# => ['foo', 'bar', ...]
  • pluck is faster as it only selects the title to array, but executes SQL call every time pluck更快,因为它只选择数组的title ,但每次都执行SQL调用
  • map will cause Rails to evaluate the SELECT * in order to populate title to array, but then you can work with loaded objects map将导致Rails评估SELECT *以便将title填充到数组,但随后您可以使用加载的对象

Conclusion 结论

There is no silver bullet. 没有银弹。 It always depends on what you want to achive. 它总是取决于你想要实现的目标。

One may argue that the "optimize SQL" solution works the best, but that's not true. 有人可能会说“优化SQL”解决方案效果最好,但事实并非如此。 You need to implement similar SQL optimization in every place where you are calling work.policy.able_to_delete which may be 10 or 100 places. 您需要在调用work.policy.able_to_delete每个位置实现类似的SQL优化,这可能是10或100个位置。 includes may not be always good idea in terms of performance. includes在表现方面可能并不总是好主意。

Cache can get supper chained in terms of what event should drop what part of the cache. 缓存可以根据什么事件应该删除缓存的哪个部分来获得超级链接。 If you don't do it properly your website may be displaying "out of date information" ! 如果您没有正确地做到这一点,您的网站可能会显示“过时信息”! In case of policy objects that is super dangerous. 如果策略对象超级危险。

Memoization is not always flexible enough as you may need to redesign large part of code base to achieve it and introduce too many layers of unnecessary abstraction Memoization并不总是足够灵活,因为您可能需要重新设计大部分代码库来实现它并引入太多不必要的抽象层

Not to mention that memoization is big No No in thread safe enviroments like Rubinius unless you sync your threads correctly. 更不用说在Rubinius这样的线程安全环境中,memoization是很大的No No,除非你正确地同步你的线程。 Don't worry you are fine with memoization (in 95% cases) if you use MRI, Rails & Puma are Thread safe but that's different kind of "thread safe". 如果你使用MRI,Rails和Puma是线程安全的,那么不要担心你的记忆是好的(在95%的情况下),但这是不同类型的“线程安全”。 You really need to do something stuppid for that to be an issue. 你真的需要做一些难以解决的问题。 This article is way too long to go into that topic. 这篇文章太长了,无法进入该主题。 Google it! 谷歌一下!

Really depends what your application (part of application) is aims for. 真的取决于您的应用程序(应用程序的一部分)的目标。 My only recommendation is: Profile/benchmark your app ! 我唯一的建议是:个人资料/基准你的应用! Don't prematurely optimize. 不要过早优化。 Use tools like New relic to discover what parts of your app are slow. 使用New relic等工具来发现应用程序的哪些部分很慢。

Optimize gradually, don't build slow application and then In one sprint you will decide "Right, lets optimize" because you may find out that you made poor design choices and 50% of your App needs rewrite to be faster. 逐步优化,不要构建缓慢的应用程序然后在一个sprint中你将决定“正确,让我们优化”,因为你可能会发现你做出了糟糕的设计选择,50%的App需要重写更快。

Other solutions not mentioned 未提及的其他解决方案

Counter Cache 计数器缓存

Database indexes 数据库索引

May sound of topic but lot of performance issues happens because your app has no DB indexes (or too many premature indexes) 可能会发出主题但很多性能问题都会发生,因为您的应用没有数据库索引(或过多的过早索引)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM