简体   繁体   中英

What's the fastest most efficient way to compare a dropbox filelist to the contents of a table in Ruby on Rails 3?

I've got this working, but it's very slow (to the point of timing out) for even a dozen or so files.

It grabs a dir listing from dropbox, and compares that to the contents of a table. I'd like to optimize this so that it would run as fast and efficiently as possible. I know querying every time is not optimal, but i think the major delay is during the Photo.create method since that is where it copies the file from the dropbox folder to Amazon S3 (via carrierwave gem ). I am looking into putting a time on the operation to see where the delay is coming from. for a folder with 10 files, it takes well over a minute to load the page. The strange thing is that it takes this long even if it skips those files because they already exist, which makes no sense to me.

Here's my controller code:

def sync
    photo_size = 1024
    @event = Event.find(params[:id])

    @client = Dropbox::API::Client.new(:token  => 'derp', :secret => 'herp')
    @dropbox_files = @client.ls "images/#{@event.keyword}/#{photo_size}/"

    @existing_photos = @event.photos.all
    @data = []

    # TODO: need to make it not add files multiple times


    @dropbox_files.each do |f|


      photo_exists = Photo.where(:dropbox_path => f.direct_url.url).count
      if photo_exists == 0
        @photo = Photo.create(:remote_filename_url => f.direct_url.url, 
                              :dropbox_path => f.direct_url.url,
                              :event_id => @event.id)
        @data << "Added: #{f.direct_url.url.split('/').last}"
      else
        @data << "Skipped: #{f.direct_url.url.split('/').last}"
      end
    end
  end

Ideally, i'd like to separate each Photo.create call into an async request, but that might be a whole 'notha thing. For now, i'd be happy if it was something that could handle adding 5 photos out of a list of 100 without timing out.

what is the best way to do this? I'm a PHP programmer that's new to RoR3. Please help. thanks!

one note: for now, this outputs to a screen, but eventually it will be a background action.

I have a few things you could try. I'm not familiar with the Dropbox API, but you should be able to figure this out:

Store the date of the last sync, and only retrieve files that are new or changed.

Extract your sync method into a new class- the controller probably isn't the best choice for this responsibility. Here's an example of how you could do that:

class EventSync
  attr_reader :event

  def initialize(event_or_id)
    @event = Event.find(event_or_id)
  end

  def sync
    dropbox_files.each do |f|
      process_file(f)
    end
  end

  private
    def photo_size
      1024
    end

    def process_file(file)
      event.photos.where(dropbox_path: file.direct_url.url).first_or_create do |file|
        file.remote_filename_url = file.direct_url.url
      end 
    end

    def client
      @client ||= Dropbox::API::Client.new(:token  => 'derp', :secret => 'herp')
    end

    def dropbox_files
      @dropbox_files ||= client.ls "images/#{event.keyword}/#{photo_size}/"
    end

end

This would be used like this: EventSync.new(params[:event_id]).sync .

By splitting this into many smaller methods, benchmarking will be easier (you can test each method individually), meaning you'll better be able to identify where the slowdown is.

This is the way I have it working now, before I try Zach's method.

In the Controller:

  def syncall
    #TODO: Refactor sync and syncall
    photo_size = 1024
    @event = Event.find(params[:id])

    new_image_dir = "images/#{@event.keyword}/#{photo_size}/"
    @client = Dropbox::API::Client.new(:token  => 'uuzpqar2m5839eo', :secret => 'nr9tmx0vc8qh892')
    @dropbox_files = @client.ls new_image_dir
    start = Time.now  

    existing_photos = @event.photos.all
    @data = []
    photo_list = []

    existing_photos.each do |ep|
      filename = URI.unescape(ep.dropbox_path.split('/').last) #dropbox_path is url encoded...
      photo_list << filename
    end
    @data << photo_list

    skipped_files = 0

    @dropbox_files.each do |f|
      sql_start = Time.now
      db_filename = f.path.split('/').last

      if photo_list.include? db_filename 
        skipped_files += 1
      else
        pc_start = Time.now
        if db_filename.split('.').last == 'jpg'
          db_path = f.direct_url.url
          @photo = Photo.create(:remote_filename_url => db_path, 
                                :dropbox_path => db_path,
                                :event_id => @event.id)
          @data << "#{db_filename} added in #{Time.now - pc_start} seconds"
        else
          @data << "#{db_filename} was skipped in #{Time.now - pc_start} seconds"
        end
      end
    end    
    @data << "Total Time: #{Time.now - start} (#{skipped_files} skipped.)"
  end

This way, if there are no files to add, there is only one query run. Another issue is that the direct_url.url call is pretty heavy, as it connects to dropbox each time it's called.

It went from about 2s to .01s per photo skipped and 5-7s to 2-4s per photo uploaded. I still like Zach's method better, so i am going to try that now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM