简体   繁体   English

等价于foo_ids的find_each?

[英]Equivalent of find_each for foo_ids?

Given this model: 鉴于此模型:

class User < ActiveRecord::Base
  has_many :things
end

Then we can do this:: 然后我们可以这样做::

@user = User.find(123)
@user.things.find_each{ |t| print t.name }
@user.thing_ids.each{ |id| print id }

There are a large number of @user.things and I want to iterate through only their ids in batches, like with find_each . 有很多@user.things ,我想只批量遍历它们的id,就像find_each Is there a handy way to do this? 有没有方便的方法来做到这一点?

The goal is to: 目标是:

  • not load the entire thing_ids array into memory at once 不要立即将整个thing_ids数组加载到内存中
  • still only load arrays of thing_ids , and not instantiate a Thing for each id 仍然只加载thing_ids数组,而不是为每个id实例化Thing

Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. Rails 5引入了in_batches方法,它产生一个关系并在内部使用in_batches pluck(primary_key) And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids: 我们可以使用关系的where_values_hash方法来检索已经拔掉的id:

@user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }

Note that in_batches has order and limit restrictions similar to find_each . 请注意, in_batches具有与find_each类似的orderlimit限制。

This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. 这种方法有点hacky,因为它取决于in_batches的内部实现,如果in_batches在将来停止in_batches id,则会失败。 A non-hacky method would be batch_rel.pluck(:id) , but this runs the same pluck query twice. 非hacky方法是batch_rel.pluck(:id) ,但这会运行相同的pluck查询两次。

You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4 您可以尝试类似下面的内容,每个切片一次可以使用4个元素,它们可以围绕4个元素循环

@user.thing_ids.each_slice(4) do |batch|
  batch.each do |id|
   puts id
   end
end

UPDATE Final EDIT: 更新最终编辑:

I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :) 我在审核了你的更新问题之后更新了我的答案(不确定为什么在我用源代码备份我的答案来证明它之后你会贬低...但我不抱怨:)

Here is my solution, tested and working, so you can accept this as the answer if it pleases you. 这是我的解决方案,测试和工作,所以如果它让你高兴,你可以接受这个答案。

Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. 下面,我扩展了ActiveRecord :: Relation,重写了find_in_batches方法以接受一个附加选项:relation。 When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query. 设置为true时,它将返回与块的activerecord关系,因此您可以使用所需的方法'pluck'来仅获取目标查询的ID。

#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
  extend ActiveSupport::Concern

  def find_in_batches(options = {})
    options.assert_valid_keys(:start, :batch_size, :relation)

    relation = self
    start = options[:start]
    batch_size = options[:batch_size] || 1000

    unless block_given?
      return to_enum(:find_in_batches, options) do
        total = start ? where(table[primary_key].gteq(start)).size : size
        (total - 1).div(batch_size) + 1
      end
    end

    if logger && (arel.orders.present? || arel.taken.present?)
      logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
    end

    relation = relation.reorder(batch_order).limit(batch_size)
    records = start ? relation.where(table[primary_key].gteq(start)) : relation

    records = records.to_a unless options[:relation]

    while records.any?
      records_size = records.size
      primary_key_offset = records.last.id
      raise "Primary key not included in the custom select clause" unless primary_key_offset

      yield records

      break if records_size < batch_size

      records = relation.where(table[primary_key].gt(primary_key_offset))
      records = records.to_a unless options[:relation]
    end
  end

end

ActiveRecord::Relation.send(:include, ARAExtension)

here is the initializer 这是初始化器

#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"

Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. 最初,此方法强制将关系转换为一系列activrecord对象并将其返回给您。 Now, I optionally allow you to return the query before the conversion to the array happens. 现在,我可选地允许您在转换到数组之前返回查询。 Here is an example of how to use it: 以下是如何使用它的示例:

@user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
  # do any kind of further querying/filtering/mapping that you want

  # show that this is actually an activerecord relation, not an array of AR objects
  puts batch_query.to_sql
  # add more conditions to this query, this is just an example
  batch_query = batch_query.where(:color=>"blue")
  # pluck just the ids
  puts batch_query.pluck(:id)
end

Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. 最终,如果您不喜欢SO帖子上给出的任何答案,您可以自己动手解决方案。 Consider only downvoting when an answer is either way off topic or not helpful in any way. 当答案偏离主题或以任何方式没有帮助时,只考虑低估。 We are all just trying to help. 我们都在努力提供帮助。 Downvoting an answer that has source code to prove it will only deter others from trying to help you. 低估一个有源代码来证明它的答案只会阻止其他人试图帮助你。

Previous EDIT 以前的编辑

In response to your comment (because my comment would not fit): 在回复您的评论时(因为我的评论不合适):

  1. calling thing_ids internally uses pluck 调用thing_ids内部使用动物内脏
  2. pluck internally uses select_all pluck内部使用select_all
  3. ...which instantiates an activerecord Result ...实例化一个activerecord 结果

Previous 2nd EDIT: 前2期编辑:

This line of code within pluck returns an activerecord Result : pluck中的这行代码返回一个activerecord 结果

 ....
 result = klass.connection.select_all(relation.arel, nil, bound_attributes)
 ...

I just stepped through the source code for you. 我刚刚为您介绍了源代码。 Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method. 使用select_all将为您节省一些内存,但最终,即使您使用pluck方法,仍会创建并映射activerecord Result

It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead: 不幸的是,它不是一个允许你这样做的单行或帮助,所以相反:

limit = 1000
offset = 0
loop do
  batch = @user.things.limit(limit).offset(offset).pluck(:id)
  batch.each { |id| puts id }
  break if batch.count < limit
  offset += limit
end

I would use something like this: 我会用这样的东西:

User.things.find_each(batch_size: 1000).map(&:id)

This will give you an array of the ids. 这将为您提供一系列ID。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM