I've got this working, but it's very slow (to the point of timing out) for even a dozen or so files.
It grabs a dir listing from dropbox, and compares that to the contents of a table. I'd like to optimize this so that it would run as fast and efficiently as possible. I know querying every time is not optimal, but i think the major delay is during the Photo.create
method since that is where it copies the file from the dropbox folder to Amazon S3 (via carrierwave gem
). I am looking into putting a time on the operation to see where the delay is coming from. for a folder with 10 files, it takes well over a minute to load the page. The strange thing is that it takes this long even if it skips those files because they already exist, which makes no sense to me.
Here's my controller code:
def sync
photo_size = 1024
@event = Event.find(params[:id])
@client = Dropbox::API::Client.new(:token => 'derp', :secret => 'herp')
@dropbox_files = @client.ls "images/#{@event.keyword}/#{photo_size}/"
@existing_photos = @event.photos.all
@data = []
# TODO: need to make it not add files multiple times
@dropbox_files.each do |f|
photo_exists = Photo.where(:dropbox_path => f.direct_url.url).count
if photo_exists == 0
@photo = Photo.create(:remote_filename_url => f.direct_url.url,
:dropbox_path => f.direct_url.url,
:event_id => @event.id)
@data << "Added: #{f.direct_url.url.split('/').last}"
else
@data << "Skipped: #{f.direct_url.url.split('/').last}"
end
end
end
Ideally, i'd like to separate each Photo.create
call into an async request, but that might be a whole 'notha thing. For now, i'd be happy if it was something that could handle adding 5 photos out of a list of 100 without timing out.
what is the best way to do this? I'm a PHP programmer that's new to RoR3. Please help. thanks!
one note: for now, this outputs to a screen, but eventually it will be a background action.
I have a few things you could try. I'm not familiar with the Dropbox API, but you should be able to figure this out:
Store the date of the last sync, and only retrieve files that are new or changed.
Extract your sync
method into a new class- the controller probably isn't the best choice for this responsibility. Here's an example of how you could do that:
class EventSync
attr_reader :event
def initialize(event_or_id)
@event = Event.find(event_or_id)
end
def sync
dropbox_files.each do |f|
process_file(f)
end
end
private
def photo_size
1024
end
def process_file(file)
event.photos.where(dropbox_path: file.direct_url.url).first_or_create do |file|
file.remote_filename_url = file.direct_url.url
end
end
def client
@client ||= Dropbox::API::Client.new(:token => 'derp', :secret => 'herp')
end
def dropbox_files
@dropbox_files ||= client.ls "images/#{event.keyword}/#{photo_size}/"
end
end
This would be used like this: EventSync.new(params[:event_id]).sync
.
By splitting this into many smaller methods, benchmarking will be easier (you can test each method individually), meaning you'll better be able to identify where the slowdown is.
This is the way I have it working now, before I try Zach's method.
In the Controller:
def syncall
#TODO: Refactor sync and syncall
photo_size = 1024
@event = Event.find(params[:id])
new_image_dir = "images/#{@event.keyword}/#{photo_size}/"
@client = Dropbox::API::Client.new(:token => 'uuzpqar2m5839eo', :secret => 'nr9tmx0vc8qh892')
@dropbox_files = @client.ls new_image_dir
start = Time.now
existing_photos = @event.photos.all
@data = []
photo_list = []
existing_photos.each do |ep|
filename = URI.unescape(ep.dropbox_path.split('/').last) #dropbox_path is url encoded...
photo_list << filename
end
@data << photo_list
skipped_files = 0
@dropbox_files.each do |f|
sql_start = Time.now
db_filename = f.path.split('/').last
if photo_list.include? db_filename
skipped_files += 1
else
pc_start = Time.now
if db_filename.split('.').last == 'jpg'
db_path = f.direct_url.url
@photo = Photo.create(:remote_filename_url => db_path,
:dropbox_path => db_path,
:event_id => @event.id)
@data << "#{db_filename} added in #{Time.now - pc_start} seconds"
else
@data << "#{db_filename} was skipped in #{Time.now - pc_start} seconds"
end
end
end
@data << "Total Time: #{Time.now - start} (#{skipped_files} skipped.)"
end
This way, if there are no files to add, there is only one query run. Another issue is that the direct_url.url
call is pretty heavy, as it connects to dropbox each time it's called.
It went from about 2s to .01s per photo skipped and 5-7s to 2-4s per photo uploaded. I still like Zach's method better, so i am going to try that now.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.