How does my Rails app download a file from S3?

Question

I have an app where users will be uploading files directly to S3. This is working. Now, I need a background worker (presently delayed_job) to retrieve the file and stash it in 'tmp/files' for processing.

How can this be done?

Edits: The app is presently running in EC2.

Answer 1

Background workers will run independently from your web app.

Try Resque for a commonly used Rails background worker solution. The idea is that you start Resque up independently from your web app, and it does its jobs independently of the application.

Have this worker sent basic HTTP requests to S3. Here's an API reference card to get you started. The idea is that you use some sort of Ruby REST client to send these requests, and parse the response you get from S3. Rest-client is a gem you can use to do this.

Optionally, you can also have the worker use the S3 gem which could be a bit easier.

With this approach, you'd have your worker run a script that does something like

picture = S3Object.find 'headshot.jpg', 'photos'

Answer 2

Use Resque.

add

gem 'resque'
gem 'resque-status'

With Resque you need Redis (to store information about workers) either use Redis-to-go or install Redis locally on your EC2 machine.

So after Resque is installed edit config/initializers/resque.rb

rails_root = ENV['RAILS_ROOT'] || File.dirname(__FILE__) + '/../..'
rails_env = ENV['RAILS_ENV'] || 'production'
resque_config = YAML.load_file(rails_root + '/config/resque.yml')
Resque.redis = resque_config[rails_env]

    # This is if you are using Redis to go:
    # ENV["REDISTOGO_URL"] ||= "redis://REDISTOGOSTUFFGOESHERE"
    # uri = URI.parse(ENV["REDISTOGO_URL"])
    # Resque.redis = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password, :thread_safe => true)

Resque::Plugins::Status::Hash.expire_in = (24 * 60 * 60) # 24hrs in seconds

Dir["#{Rails.root}/app/workers/*.rb"].each { |file| require file }

Here we are using local Redis, so resque.yml looks like this:

development: localhost:6379
test: localhost:6379
fi: localhost:6379
production: localhost:6379

You will need something like God to start/manage workers

So install it then add "resque-production.god" to config/ folder of your app You will be able to start your workers via this: god -c config/resque-production.god the config/resque-production.god file willl have something like:

rails_env   = ENV['RAILS_ENV']  || "production"
rails_root  = ENV['RAILS_ROOT'] || File.dirname(__FILE__) + '/..'
num_workers = 1

num_workers.times do |num|
  God.watch do |w|
    w.dir      = "#{rails_root}"
    w.name     = "resque-#{num}"
    w.group    = 'resque'
    w.interval = 30.seconds
    w.env      = {"QUEUE"=>"*", "RAILS_ENV"=>"production"}
    w.start    = "rake -f #{rails_root}/Rakefile environment resque:work --trace"
    w.log      = "#{rails_root}/log/resque.log"
    w.err_log  = "#{rails_root}/log/resque_error.log"


    # restart if memory gets too high
    w.transition(:up, :restart) do |on|
      on.condition(:memory_usage) do |c|
        c.above = 350.megabytes
        c.times = 2
      end
    end

    # determine the state on startup
    w.transition(:init, { true => :up, false => :start }) do |on|
      on.condition(:process_running) do |c|
        c.running = true
      end
    end

    # determine when process has finished starting
    w.transition([:start, :restart], :up) do |on|
      on.condition(:process_running) do |c|
        c.running = true
        c.interval = 5.seconds
      end

      # failsafe
      on.condition(:tries) do |c|
        c.times = 5
        c.transition = :start
        c.interval = 5.seconds
      end
    end

    # start if process is not running
    w.transition(:up, :start) do |on|
      on.condition(:process_running) do |c|
        c.running = false
      end
    end
  end
end

Finally workers. They go into app/workers/ folder (here is app/workers/processor.rb)

class Processor
  include Resque::Plugins::Status
  @queue = :collect_queue

  def perform
    article_id = options["article_id"]
    article = Article.find(article_id)
    article.download_remote_file(article.file_url)
  end
 end

It is triggered by the callback in the Article model (app/models/article.rb)

class Article < ActiveRecord::Base

  after_create :process

  def download_remote_file(url)
    # OpenURI extends Kernel.open to handle URLs as files
    io = open(url)

    # overrides Paperclip::Upfile#original_filename;
    # we are creating a singleton method on specific object ('io')
    def io.original_filename
      base_uri.path.split('/').last
    end

    io.original_filename.blank? ? nil : io
  end     

def process
    Processor.create(:article_id => self.id)
  end

end

How does my Rails app download a file from S3?

Question

2 answers

solution1
0 2013-02-22 21:59:20

solution2
0 2013-02-22 22:11:01

How does my Rails app download a file from S3?

Question

2 answers

solution1 0 2013-02-22 21:59:20

solution2 0 2013-02-22 22:11:01

solution1
0 2013-02-22 21:59:20

solution2
0 2013-02-22 22:11:01