简体   繁体   中英

How to efficiently get all the rows from a huge table in Rails?

I have two tables users and notes with more than 50k records in users and 90k records in notes.

class User < ActiveRecord::Base
  has_many :notes
end

Schema of users_table:

id  email  created_at

Schema of notes_table:

id  user_id  created_at category impact_score

I am trying to create a temporary table temp_user_notes which can have data of both the tables on which i can perform ActiveRecord queries to fetch the data.

Schema of user_notes_table:

id  user_id  notes_id  email user_created_at notes_created_at category impact_score

I am doing this

def self.populate_temp_user_notes
    users = User.all
    users.each do |user|
      user.notes.each do |note|
        user_notes = TempUserNote.new
        user_notes.user_id = user.id
        user_notes.notes_id = note.id
        user_notes.auth_token_created_at = user.auth_token_created_at
        user_notes.notes_recorded_at = note.recorded_at
        user_notes.category = note.category
        user_notes.well_being_score = note.perception_score
        user_notes.save
      end
    end
  end

Looping through all the uses and their notes is very long memory eating process what other approach can i use?

EDIT From here:-

My requirement is: I have series of queries seperated by AND and OR Conditions that take use of folowing table: users, notes, transactions, subscription. suppose my query to get target users is (Query1 OR Query2) AND Query3 AND Query4 then output of every query is input of next query.

eg:

total users in DB = 1000
1. user_list =  (Query1 or Query2) #=> 500 users
2. taking 500 users from user_list as input for next query
3. user_list = user_list AND Query3  #=> 300 users
4. taking 300 users from point 3 as input for query in point 4
5. user_list = user_list AND Query4 #=> 50 users

in the last user list i have my target users and there notes.

I would use a combination of find_each and includes to decrease both: memory usage and the number of database queries:

def self.populate_temp_user_notes
  User.includes(:notes).find_each do |user|
    user.notes.each do |note|
      TempUserNote.create(
        user_id:               user.id,
        notes_id:              note.id,
        auth_token_created_at: user.auth_token_created_at,
        notes_recorded_at:     note.recorded_at,
        category:              note.category,
        well_being_score:      note.perception_score,
      )
    end
  end
end

Another very fast option might be to do this with in plain SQL like this:

INSERT INTO temp_user_notes
  (user_id, notes_id, auth_token_created_at, notes_recorded_at, category, well_being_score)
  SELECT users.id, notes.id, users.auth_token_created_at, notes.recorded_at, notes.category, notes.perception_score
    FROM users INNER JOIN notes ON users.id = notes.user_id;

With huge amounts of data, it would never be beneficial to use User.all . Imagine extracting 50k rows of data, and then for each row, making a brand new object of User class, and populating that object with the data in that row. Yes, it's gonna create trouble(s) for you, and Rails is aware of it.

You can use:

User.find_each do |user|
  # do your thing
end

find_each will give you 1000 records at one time, thus reducing memory operations.

You can overwrite the default value by providing batch_size in find_each :

User.find_each(batch_size: 3000) do |user|
  # do your thing
end

I am trying to create a temporary table temp_user_notes which can have data of both the tables on which i can perform ActiveRecord queries to fetch the data.

The more standard way of doing this is simply to join the two tables . If you use the following clause:

User.joins(:notes)

then this will give you a scope that can be used for further querying (you can eg add where conditions on both tables) and the returned records will include columns from both the User and Note models, eg:

User.joins(:notes).where("notes.impact_score > 10")

Will give you all notes with a high-enough score, together with their attached users.

If you need to go through all the joined records, you should use the find_each method that returns the records in batches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM