简体   繁体   中英

Rails 3: What is the best way to update a column in a very large table

I want to update all of a column in a table with over 2.2 million rows where the attribute is set to null. There is a Users table and a Posts table. Even though there is a column for num_posts in User, only about 70,000 users have that number populated; otherwise I have to query the db like so:

@num_posts = @user.posts.count

I want to use a migration to update the attributes and I'm not sure whether or not it's the best way to do it. Here is my migration file:

class UpdateNilPostCountInUsers < ActiveRecord::Migration
  def up
    nil_count = User.select(:id).where("num_posts IS NULL")

    nil_count.each do |user|
      user.update_attribute :num_posts, user.posts.count
    end
  end

  def down
  end
end

In my console, I ran a query on the first 10 rows where num_posts was null, and then used puts for each user.posts.count . The total time was 85.3ms for 10 rows, for an avg of 8.53ms. 8.53ms*2.2million rows is about 5.25 hours, and that's without updating any attributes. How do I know if my migration is running as expected? Is there a way to log to the console %complete? I really don't want to wait 5+ hours to find out it didn't do anything. Much appreciated.

EDIT: Per Max's comment below, I abandoned the migration route and used find_each to solve the problem in batches. I solved the problem by writing the following code in the User model, which I successfully ran from the Rails console:

def self.update_post_count
    nil_count = User.select(:id).where("num_posts IS NULL")
    nil_count.find_each { |user|
        user.update_column(:num_posts, user.posts.count) if user.posts
    }
end

Thanks again for the help everyone!

desc 'Update User post cache counter'
task :update_cache_counter => :environment do

  users = User.joins('LEFT OUTER JOIN "posts" ON "posts.user_id" = "users.id"')
              .select('"users.id", "posts.id", COUNT("posts.id") AS "p_count"')
              .where('"num_posts" IS NULL')

  puts "Updating user post counts:"
  users.find_each do |user|
    print '.'
    user.update_attribute(:num_posts, user.p_count)
  end
end

First off don't use a migration for what is essentially a maintenance task. Migrations should mainly alter the schema of your database. Especially if it is long running like in this case and may fail midway resulting in a botched migration and problems with the database state.

Then you need to address the fact that calling user.posts is causing a N+1 query and you instead should join the posts table and select a count.

And without using batches you are likely to exhaust the servers memory quickly.

You can use update_all and subquery to do this.

sub_query = 'SELECT count(*) FROM `posts` WHERE `posts`.`user_id` = `users`.`id`'
User.where('num_posts IS NULL').update_all('num_posts = (#{sub_query})')

It will take only seconds instead of hours. If so, you may not have to find a way to log something.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM