Running rake db:seed on AWS Elastic Beanstalk

Question

I'm trying to deploy my first rails app using Elastic Beanstalk, and I've gotten to the point where I need to seed my database with approximately half a million records. My initial attempt was to create a .config file in my .ebextensions folder and then use git aws.push from the command line, but it kept giving me timeout errors.

So, I abandoned that and decided to just directly ssh into my EC2 instance and run it from there. However, that's not working for me either. I cd into var/app/current, and then ran rake db:seed RAILS_ENV=production. It seems to run for a minute or two, then outputs 'killed' before exiting.

I also attempted to seed just one record, just to see if the size of the file had anything to do with it. If I do that, it springs an error telling me that my SQLITE database is read only. I'm pretty sure my db is set up to use MySQL in production, I changed database.yml to use the various ENV variables, and when I run eb status from the command line, it tells me that MySQL is being used.

The weird thing is, I swear I did these exact same steps earlier yesterday, sshing in and seeding the database, and it worked. The only problem was I made a few changes, terminated the app and decided to start over, and now it doesn't work at all. Any ideas what I'm doing wrong? This is a Rails 4.1/Ruby 2.1 setup.

Answer 1

So, I got this working a little while ago, figured I should answer this. My first problem was that I screwed up database.yml. I left out the adapter: mysql2 line, so it was attempting to connect to an sqlite db, hence the readonly errors.

Once I changed that, I could connect to my AWS RDS instance, and I could seed one record just fine. However, when I tried to seed the entire 500k records, it was still getting killed. I'm using the AWS free tier, so I think their micro instances don't allow long running processes for performance reasons. To get around that, I created a rake task which split my seeds.rb file into a bunch of smaller files, seeds-01.rb to seeds-1000.rb for example.

desc "Splits a file into smaller subfiles"
task :subfiles, [:filename, :num_files] => :environment do |task, args|
  lines = File.readlines(args[:filename])
  num_files = args[:num_files].to_i
  lines_per_file = lines.count / num_files
  extension = File.extname(args[:filename])
  basename = File.basename(args[:filename])
  puts lines_per_file.to_s
  puts lines.count.to_s
  num_files.times do |num_file|
    subfile = File.open(basename + "-#{num_file}" + extension, "w")
    subline_start = num_file*lines_per_file
    subline_end = (num_file+1)*lines_per_file-1
    subline_end = lines.count-1 if num_file == num_files-1
    sublines = lines[subline_start..subline_end].each do |subline|
      subfile.puts subline
    end
    subfile.close
  end
end

Then, I generated a bash script to run each file, like so:

rails runner seeds-01.rb
rails runner seeds-02.rb
...
rails runner seeds-1000.rb

It should be noted that I tried the following as well, and for whatever reason, it was much much slower than using rails runner.

sudo cp seeds-01.rb seeds.rb
rake db:seed
...
sudo cp seeds-1000.rb seeds.rb
rake db:seed

So don't do that. So then, after I used the elastic beanstalk command line tools to deploy my app, I sshed into my instance and ran my bash script.

cd /var/app/current/db
bash bash-script.txt

This ensured that the files were small enough that it stopped timing out.

Running rake db:seed on AWS Elastic Beanstalk

Question

1 answers

solution1
1 ACCPTED 2014-11-20 04:39:52

Running rake db:seed on AWS Elastic Beanstalk

Question

1 answers

solution1 1 ACCPTED 2014-11-20 04:39:52

solution1
1 ACCPTED 2014-11-20 04:39:52