简体   繁体   English

重新处理大量回形针样式

[英]Reprocessing large amount Paperclip styles

I have a decent amount of paperclip attachments(~270k, images) which I want to add another style to. 我有相当数量的回形针附件(〜270k,图像),我想为其添加其他样式。 These are all stored on S3 with fog. 这些都带有雾存储在S3上。 From initial testing and some back of the napkin calculations it seems like it would take about 2 weeks to do this which really isn't feasible. 从最初的测试和餐巾纸的一些计算来看,这似乎需要大约2周的时间,但这确实不可行。

rake paperclip:refresh:missing_styles

Feels like the obvious choice here, but it seems like it will try to download all styles for each attachment to figure out if it is in fact missing. 感觉这里似乎是一个明显的选择,但似乎它将尝试为每个附件下载所有样式,以找出是否确实丢失了。 Since I know that the new style is always missing this seems redundant. 因为我知道新样式总是会丢失的,所以这似乎是多余的。

So far I am thinking of splitting the workload over 10 or so workers 到目前为止,我正在考虑将工作量分配给大约10名工人

NUM_WORKERS = 10
PER_WORKER = (270_000 / NUM_WORKERS)

ranges = []
start = 1

NUM_WORKERS.times do 
  ranges << { start: start, batch: PER_WORKER }
  start += PER_WORKER
end

and running one rake task for each range using ActiveRecord Batch API . 然后使用ActiveRecord Batch API为每个范围运行一个rake任务。

So my questions are. 所以我的问题是。

  1. Anyways to improve this and lessons from previous experiences 无论如何要改善这一点和以前的经验教训
  2. If it's possible to skip generate only for the new styles. 如果可能的话,仅针对新样式生成。 Maybe refresh:thumbnails with STYLE is a better approach 也许refresh:thumbnails使用STYLE refresh:thumbnails是更好的方法

Thank you in advance 先感谢您

EDIT: 编辑:

I ended writing a rake task that queues every attachment on a sidekiq low priority queue and a worker to dequeue and process these queued jobs. 我结束了编写rake任务,该任务将每个附件排在sidekiq低优先级队列上,并由一个工作人员出队并处理这些排队的作业。 So far this is working well, it is not very fast, but it's out of my way and happening in the background in a satisfactory manner. 到目前为止,这种方法运行良好,不是很快,但是不合时宜,并且以令人满意的方式在后台发生。 This approach can also be parallelized easily by adding more instances of rails since they each come with their own set of Sidekiq workers 通过添加更多的rails实例,还可以轻松实现此方法的并行化,因为它们各自带有自己的Sidekiq工人集

As per this guide you can manually reprocess only a certain style thus: 按照本指南,您只能手动重新处理某种样式,从而:

my_model.an_attachment.reprocess!(:a_certain_style)

You method of splitting the workload seems feasible. 您分配工作负载的方法似乎可行。

I remember seeing ads for a service which would process images by pulling and pushing straight from/to your S3 storage, maybe that would be the long-term solution rather than doing the heavy work yourself. 我记得曾经看到过一项服务广告,该服务可以通过直接从S3存储设备中拉入/推入图像来处理图像,也许这将是长期的解决方案,而不是您自己进行繁重的工作。 Don't remember the name of the service though. 虽然不记得服务的名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM