简体   繁体   English

如何在 Rails 3/4 中批量运行更新?

[英]How can I run updates in batches in Rails 3/4?

I need to mass-update many thousands of records, and I would like to process the updates in batches.我需要批量更新数千条记录,我想批量处理更新。 First, I tried:首先,我试过:

Foo.where(bar: 'bar').find_in_batches.update_all(bar: 'baz')

...which I was hoping would generate SQL such as: ...我希望会生成 SQL,例如:

"UPDATE foo SET bar = 'baz' where bar='bar' AND id > (whatever id is passed in by find_in_batches)"

That doesn't work because find_in_batches returns an array, while update_all needs an ActiveRecord relation.这不起作用,因为 find_in_batches 返回一个数组,而 update_all 需要一个 ActiveRecord 关系。

This is what I tried next:这是我接下来尝试的:

Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
  ids = foos.map(&:id)
  Foo.where(id: ids).update_all(bar: 'baz')
end

That works, but it obviously runs a select followed by the update, rather than a single update based on my 'where' conditions.这是有效的,但它显然运行一个选择然后更新,而不是基于我的“位置”条件的单个更新。 Is there any way to clean this up, so that the select and update don't have to be separate queries?有没有办法清理它,以便选择和更新不必是单独的查询?

In Rails 5, there's a new handy method ActiveRecord::Relation#in_batches to solve this problem: 在Rails 5中,有一个新的方便的方法ActiveRecord::Relation#in_batches来解决这个问题:

Foo.in_batches.update_all(bar: 'baz')

Check documentation for details. 检查文档以获取详细信

I'm surprised, too, that there isn't an easier way to do this... but I did come up with this approach: 我也很惊讶,没有一种更简单的方法可以做到这一点......但我确实提出了这种方法:

batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
  Foo.where(bar: 'bar').order(:id)
                       .offset(offset)
                       .limit(batch_size)
                       .update_all(bar: 'baz')
end

Basically this will: 基本上这将:

  1. Create an array of offsets between 0 and Foo.count stepping by batch_size each time. 每次在batch_size之间创建一个0Foo.count之间的偏移量数组。 For example, if Foo.count == 10500 you'd get: [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000] 例如,如果Foo.count == 10500你会得到: [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000] Foo.count == 10500 [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
  2. Loop through these numbers and use them as an OFFSET in the SQL query, being sure to order by id , and limiting to the batch_size . 循环遍历这些数字并在SQL查询中将它们用作OFFSET,确保按id排序,并限制为batch_size
  3. Update at most batch_size records whose "index" is greater than offset . 更新大多数“index”大于offset batch_size记录。

This is basically the manual way to perform what you said you were hoping for in the generated SQL. 这基本上是在生成的SQL中执行您所说的希望的手动方式。 Too bad it can't just be done this way already by a standard library method... though I'm sure you could create one of your own. 太糟糕了,它不能仅仅通过标准库方法以这种方式完成...虽然我确信你可以创建自己的一个。

This is 2 years late, but the answers here are a) very slow for large data sets and b) ignore the builtin rails capabilities ( http://api.rubyonrails.org/classes/ActiveRecord/Batches.html ). 这是迟了2年,但这里的答案是:a)对于大型数据集来说非常慢; b)忽略内置轨道功能( http://api.rubyonrails.org/classes/ActiveRecord/Batches.html )。

As the offset value increases, depending on your DB server, it will do a sequence scan until it reaches your block, and then fetches the data for processing. 随着偏移值的增加,它将根据您的数据库服务器进行序列扫描,直到它到达您的块,然后提取数据进行处理。 As your offset gets into the millions, this will be extremely slow. 随着您的偏移量达到数百万,这将非常缓慢。

use the "find_each" iterator method: 使用“find_each”迭代器方法:

Foo.where(a: b).find_each do |bar|
   bar.x = y
   bar.save
end

This has the added benefit of running the model callbacks with each save. 这具有每次保存运行模型回调的额外好处。 If you don't care for the callbacks, then try: 如果您不关心回调,请尝试:

Foo.where(a: b).find_in_batches do |array_of_foo|
  ids = array_of_foo.collect &:id
  Foo.where(id: ids).update_all(x: y)
end

pdobb's answer is on the right track, but didn't work for me in Rails 3.2.21 because of this issue of ActiveRecord not parsing OFFSET with UPDATE calls: pdobb的答案是在正确的轨道上,但在Rails 3.2.21中对我没有用,因为ActiveRecord的这个问题没有用UPDATE调用解析OFFSET:

https://github.com/rails/rails/issues/10849 https://github.com/rails/rails/issues/10849

I modified the code accordingly and it worked fine for concurrently setting the default value on my Postgres table: 我相应地修改了代码,它在我的Postgres表上同时设置默认值时工作正常:

batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
  Foo.where('id > ? AND id <= ?', offset, offset + batch_size).
      order(:id).
      update_all(foo: 'bar')
end

I've written a small method to invoke update_all in batches: 我写了一个小方法来批量调用update_all:

https://gist.github.com/VarunNatraaj/420c638d544be59eef85 https://gist.github.com/VarunNatraaj/420c638d544be59eef85

Hope it is useful! 希望它有用! :) :)

Haven't had a chance to test this yet but you might be able to use ARel and a sub query. 还没有机会测试这个,但你可以使用ARel和子查询。

Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
  Foo.where( Foo.arel_table[ :id ].in( foos.to_arel ) ).update_all(bar: 'baz')
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM