简体   繁体   English

RoR - 在rails中上传大文件

[英]RoR - Large file uploads in rails

I have a rails webapp that allows users to upload videos, where they are stored in an NFS-mounted directory. 我有一个rails webapp,允许用户上传视频,它们存储在NFS安装目录中。

The current setup is fine for smaller files, but I need to support large file uploads as well (up to 4gb). 当前设置适用于较小的文件,但我也需要支持大文件上传(最高4GB)。 When I try to upload a 4gb file, it eventually happens but is awful from a UX standpoint: upload starts and progress is displayed based on XHR 'progress' events, but then after 100%, there is still a long wait (5+ minutes) before the server responds to the request. 当我尝试上传4gb文件时,它最终会发生,但从UX的角度来看很糟糕:上传开始并根据XHR'进度'事件显示进度,但是在100%之后,仍然需要很长时间(5分钟以上) )在服务器响应请求之前。

Initially I thought this had to do with copying the file from some temp directory over to the final NFS-mounted directory. 最初我认为这与将文件从某个临时目录复制到最终的NFS挂载目录有关。 But now I'm not so sure. 但现在我不太确定。 After adding logging to my routes, I see that there is about a 3-minute wait between when the file upload progress reaches 100% and when the code in my controller action runs (before I do any handling for moving the file to the NAS). 在我的路由添加日志记录后,我发现文件上传进度达到100%与控制器操作中的代码运行之间有大约3分钟的等待时间(在我将文件移动到NAS之前进行任何处理之前) 。

I'm wondering the following: 我想知道以下内容:

  • What is happening during this 3 minute wait after the upload completes and before my action is called? 在上传完成后和我的操作被调用之前的3分钟等待期间发生了什么?
  • Is there a way for me to account for whatever is going on during this period so that the client gets a response immediately after the upload completes so that they don't time out? 有没有办法让我考虑在此期间发生的任何事情,以便客户端在上传完成后立即获得响应,以便他们不会超时?
  • How are large file uploads typically handled in Rails? 如何在Rails中处理大文件上传? This seems like it would be a common problem, but I can't seem to find anything on it. 这似乎是一个常见的问题,但我似乎无法找到任何东西。

(Note: I was originally using CarrierWave for uploads when I discovered this problem. I removed it and simply handled the file save using FileUtils directly in my model just to make sure the wait times weren't the result of some CarrierWave magic happening behind the scenes, but got exactly the same result.) (注意:当我发现这个问题时,我最初使用CarrierWave进行上传。我删除了它,只是直接在我的模型中使用FileUtils处理文件保存,以确保等待时间不是因为某些CarrierWave魔法发生在场景,但得到了完全相同的结果。)

ruby -v: 1.9.3p362 ruby -v:1.9.3p362

rails -v: 3.2.11 rails -v:3.2.11

You might consider using MiniProfiler to get a better sense of where the time is being spent. 您可以考虑使用MiniProfiler来更好地了解花费的时间。

Large file uploading needs to be handled in the background. 需要在后台处理大文件上传。 Any controllers or database access should simply mark that the file was uploaded, and then queue a background processing job to move it around, and any other operations that may need to happen. 任何控制器或数据库访问应该只标记文件已上载,然后排队后台处理作业以移动它,以及可能需要发生的任何其他操作。

http://mattgrande.com/2009/08/11/delayedjob/ http://mattgrande.com/2009/08/11/delayedjob/

That article has the gist of it, every implementation is going to be different. 那篇文章有它的要点,每个实现都会有所不同。

I finally found the answer to my main question: What is happening during this 3 minute wait after the upload completes and before my action is called? 我终于找到了我的主要问题的答案: 在上传完成之后和我的行动被召唤之前的3分钟等待期间发生了什么?

It's all explained very clearly in this post: The Rails Way - Uploading Files 这篇文章中都清楚地解释了这一点: Rails方式 - 上传文件

"When a browser uploads a file, it encodes the contents in a format called 'multipart mime' (it's the same format that gets used when you send an email attachment). In order for your application to do something with that file, rails has to undo this encoding. To do this requires reading the huge request body, and matching each line against a few regular expressions. This can be incredibly slow and use a huge amount of CPU and memory." “当浏览器上传文件时,它会以一种名为'multipart mime'的格式对内容进行编码(它与您发送电子邮件附件时使用的格式相同)。为了让您的应用程序对该文件执行某些操作,rails已经要做到这一点,需要读取庞大的请求体,并将每一行与几个正则表达式进行匹配。这可能会非常慢,并且会占用大量的CPU和内存。“

I tried the modporter Apache module mentioned in the post. 我尝试了帖子中提到的modporter Apache模块。 The only problem is that the module and its corresponding plugin were written 4 years ago, and with their website no longer in operation, there's almost no documentation on either one. 唯一的问题是该模块及其相应的插件是在4年前编写的,并且随着他们的网站不再运行,几乎没有任何一个文档。

With modporter , I wanted to specify my NFS-mounted directory as the PorterDir, in the hopes that it would pass the file right along to the NAS without any extra copying from a temp directory. 使用modporter ,我想将我的NFS挂载目录指定为PorterDir,希望它能直接将文件传递给NAS,而无需从临时目录中进行任何额外的复制。 However, I was not able to get this far since the module seemed to be ignoring my specified PorterDir, and was returning a completely different path to my actions. 但是,由于模块似乎忽略了我指定的PorterDir,并且返回了与我的操作完全不同的路径,因此我无法实现这一点。 On top of that, the path it was returning didn't even exist, so I had no idea what was actually happening to my uploads. 最重要的是,它返回的路径甚至都不存在,所以我不知道我的上传实际发生了什么。

My Workaround 我的解决方法

I had to get the problem solved quickly, so I went with a somewhat hacky solution for now which consisted of writing corresponding JavaScript/Ruby code in order to handle chunked file uploads. 我必须快速解决问题,所以我现在使用了一个有点hacky的解决方案,其中包括编写相应的JavaScript / Ruby代码以处理分块文件上传。

JS Example: JS示例:

var MAX_CHUNK_SIZE = 20000000; // in bytes

window.FileUploader = function (opts) {
    var file = opts.file;
    var url = opts.url;
    var current_byte = 0;
    var success_callback = opts.success;
    var progress_callback = opts.progress;
    var percent_complete = 0;

    this.start = this.resume = function () {
        paused = false;
        upload();
    };

    this.pause = function () {
        paused = true;
    };

    function upload() {
        var chunk = file.slice(current_byte, current_byte + MAX_CHUNK_SIZE);
        var fd = new FormData();
        fd.append('chunk', chunk);
        fd.append('filename', file.name);
        fd.append('total_size', file.size);
        fd.append('start_byte', current_byte);

        $.ajax(url, {
          type: 'post',
          data: fd,
          success: function (data) {
              current_byte = data.next_byte;
              upload_id = data.upload_id;

              if (data.path) {
                  success_callback(data.path);
              }
              else {
                  percent_complete= Math.round(current_byte / file.size * 100);
                  if (percent_complete> 100) percent_complete = 100;
                  progress_callback(percent_complete); // update some UI element to provide feedback to user
                  upload();
              }
          }
        });
    }
};

(forgive any syntax errors, just typing this off the top of my head) (原谅任何语法错误,只需在我的头顶输入)

Server-side, I created a new route to accept the file chunks. 在服务器端,我创建了一个接受文件块的新路由。 On first chunk submission, I generate an upload_id based on filename/size, and determine if I already have a partial file from an interrupted upload. 在第一个块提交时,我基于文件名/大小生成upload_id,并确定我是否已经从中断的上载中获得了部分文件。 If so, I pass back the next starting byte I need along with the id. 如果是这样,我将需要的下一个起始字节与id一起传回。 If not, I store the first chunk and pass back the id. 如果没有,我存储第一个块并传回id。

The process with additional chunk uploads appending the partial file until the file size matches the original file size. 具有附加块上载的进程附加部分文件,直到文件大小与原始文件大小匹配。 At this point, the server responds with the temporary path to the file. 此时,服务器使用文件的临时路径进行响应。

The javascript then removes the file input from the form, and replaces it with a hidden input whose value is the file path returned from the server, and then posts the form. 然后,javascript从表单中删除文件输入,并将其替换为隐藏的输入,其值是从服务器返回的文件路径,然后发布表单。

Then finally server-side, I handle moving/renaming the file and saving its final path to my model. 然后最后在服务器端,我处理移动/重命名文件并保存其最终路径到我的模型。

Phew. 唷。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM