简体   繁体   English

如何在Rails中批量编写查询?

[英]How to write the query with batches in rails?

I have a users table with 800 000 records. 我有一个包含80万条记录的users表。 I created a new field called token in users table. 我在users表中创建了一个名为token的新字段。 for all the new users token is getting populated. 对于所有新用户,令牌将被填充。 for existing users to populate the token i wrote a rake task with following code. 为现有用户填充令牌,我用以下代码编写了一个rake任务。 i feel this is not work for these many records in production environment. 我觉得这不适用于生产环境中的许多记录。 How to rewrite these queries with batches or some other way of writing the queries 如何使用批处理或其他编写查询的方式重写这些查询

users = User.all
users.each do |user|
 user.token = SecureRandom.urlsafe_base64(nil, false)
 user.save
end

How you want to proceed depends on different factors: is validation important for you when executing this? 您要如何进行取决于不同的因素:执行此操作对您来说重要吗? Is time an issue? 时间有问题吗? If you don't care about validations, you may generate raw SQL queries for each user and then execute them at once, otherwise you have options like ActiveRecord transactions: 如果您不关心验证,则可以为每个用户生成原始SQL查询,然后立即执行它们,否则您可以使用ActiveRecord事务之类的选项:

User.transaction do
  users = User.all
  users.each do |user|
    user.update(token: SecureRandom.urlsafe_base64(nil, false))
  end
end

This would be quicker than your rake task, but still would take some time, depending on the number of users you want to update at once. 这将比您的rake任务更快,但是仍然需要一些时间,具体取决于您要立即更新的用户数量。

lower_limit = User.first.id
upper_limit = 30000
while true

  users = User.where('id >= ? and  id< ?',lower_limit,upper_limit)
  break if users.empty?
  users.each do |user|
    user.update(token: SecureRandom.urlsafe_base64(nil, false))
  end
  lower_limit+=30000
  upper_limit+=30000
end

I think that the best option for you is to use find_each or transactions . 我认为对您来说最好的选择是使用find_eachtransaction

Doc for find_each: find_each的文档:

Looping through a collection of records from the database (using the ActiveRecord::Scoping::Named::ClassMethods#all method, for example) is very inefficient since it will try to instantiate all the objects at once. 循环遍历数据库中的记录集合(例如,使用ActiveRecord :: Scoping :: Named :: ClassMethods#all方法)效率极低,因为它将尝试立即实例化所有对象。

In that case, batch processing methods allow you to work with the records in batches, thereby greatly reducing memory consumption. 在这种情况下,批处理方法使您可以批量处理记录,从而大大减少了内存消耗。

The find_each method uses find_in_batches with a batch size of 1000 (or as specified by the :batch_size option). find_each方法使用批处理大小为1000(或由:batch_size选项指定)的find_in_batches。

Doc for transaction: 交易文件:

Transactions are protective blocks where SQL statements are only permanent if they can all succeed as one atomic action 事务是保护性块,其中SQL语句只有当它们都可以作为一个原子动作全部成功时才是永久性的

In case that you care about memory, because you are bringnig all the 800k of users in memory, the User.all.each will instantiate the 800k objects consuming a lot of memory so my approach will be: 如果您关心内存,因为您要带动内存中的所有800k用户,则User.all.each将实例化消耗大量内存的800k对象,因此我的方法是:

User.find_each(batch_size: 500) do |user|
  user.token = SecureRandom.urlsafe_base64(nil, false)
  user.save
end

In this case, it only instantiate 500 users instead of 1000 that is the default batch_size . 在这种情况下,它仅实例化500个用户,而不是默认的batch_size的1000个用户。

If you still want to do it in only one transaction to the database, you can use the answer of @Francesco 如果您仍然只想通过一次数据库事务来执行此操作,则可以使用@Francesco的答案

The common mistake is instantiating model instance without need. 常见错误是无需实例化模型实例。 While AR instantiating is not cheap. 虽然AR实例化并不便宜。 You can try this naive code: 您可以尝试以下朴素代码:

BATCH_SIZE = 1000
while true
  uids = User.where( token: nil ).limit( BATCH_SIZE ).pluck( :id )
  break if uids.empty?
  ApplicationRecord.transaction do
    uids.each do |uid|
      # def urlsafe_base64(n=nil, padding=false)
      User
        .where( id: uid )
        .update_all( token: SecureRandom.urlsafe_base64 )
    end
  end
end

Next option is to use native DB's analog for SecureRandom.urlsafe_base64 and run one query like: 下一个选项是对SecureRandom.urlsafe_base64使用本机数据库的模拟,并运行一个查询,例如:

UPDATE users SET token=db_specific_urlsafe_base64 WHERE token IS NULL

If you won't find the analog, you can prepopulate temp table (like PostgreSQL 's' COPY command) from precalculated CSV file(id, token=SecureRandom.urlsafe_base64) and run one query like: 如果找不到类似物,则可以从预先计算的CSV文件(id,token = SecureRandom.urlsafe_base64)中预先填充临时表(如PostgreSQLCOPY命令),然后运行以下查询:

UPDATE users SET token=temp_table.token
FROM temp_table
WHERE (users.token IS NULL) AND (users.id=temp_table.id)

But in fact you need no to fill token on existing users because of: 但实际上,由于以下原因, 您无需在现有用户上填充token

i am using "token" for token based authentication in rails – John 我在Rails中使用“令牌”进行基于令牌的身份验证– John

You have to check if user's token is NULL (or expired) and redirect him to login form. 您必须检查用户的令牌是否为NULL (或已过期),并将其重定向到登录表单。 It's common way and it will save your time. 这是常见的方式,可以节省您的时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM