What is the proper way of handling data migrations in Ruby on Rails?

Question

I am having a problem where our migration files (which do data changes) is dependent upon a database that hasn't been migrated yet.

We're using Rails 6 which supports multiple databases:

production:
  primary:
    database: my_db_name
  other:
    database: other_db_name

Here is an example of our models:

class Product < ApplicationRecord
  belongs_to :other
end

class Other < ApplicationRecord
  establish_connection: :other
end

The problem exists when a migration for the primary database attempts to use data from the other database.

class AddInitialProducts < ActiveRecord::Migration[6.0]  
  def up
    obj = Other.find_by(name: "Special Object")
    Product.create(name: "Product", other: obj)
  end

  def down
    # Delete product created above.
  end
end

When I run rails db:migrate when starting with an empty database it attempts to migrate the primary database first, which fails since the other_db_name doesn't exist yet. It hasn't been migrated.

I am aware of rails db:schema:load command but we need data to exist in our application. This is also a trivial example of the data that is necessary. It's possible there will be data migrations between every release of the application, so seeds don't seem like a great idea as that file is meant for development/test.

What is the proper way to handle data migrations between releases?

Answer 1

This sounds like its outside the scope of what migrations actually should be used for which is transforming the database schema.

Instead what you want is a rake task possibly combined with a service object. You can generate rake tasks with:

rails g task products import

The generator will generate:

# lib\tasks\products.rake
namespace :products do
  desc "TODO"
  task import: :environment do
  end
end

This task can be invoked with bin/rake products:import at any point you choose.

Of course there will be no output as the task does not do anything yet. The actual implementation goes in the block passed to task . Testing Rake tasks can be kind of challenging so I find it best to do most of the actual implementation in service objects:

class ProductImporter
  def initialize(env, **kwargs)
    # set up
  end

  def call
    # do actual work 
    # ...
    teardown
  end

  def self.call(env, **kwargs)
    new(kwargs).call
  end

  private 

  def teardown
    # clean up
  end
end

Which are easy to test since they are just plain old ruby objects and you don't have worry about stuff like command line arguments.

And then just call your services from the task:

# lib\tasks\products.rake
namespace :products do
  desc "Import a list of products from..."
  task import: :environment do
    ProductImporter.call(Rails.env)
  end
end

This makes it so that you just have to test that your rake task calls the service with the correct arguments.

What is the proper way of handling data migrations in Ruby on Rails?

Question

1 answers

solution1
0 2020-02-08 12:38:37

What is the proper way of handling data migrations in Ruby on Rails?

Question

1 answers

solution1 0 2020-02-08 12:38:37

solution1
0 2020-02-08 12:38:37