简体   繁体   English

并行分块与顺序分块文件上传

[英]Parallel Chunked vs Sequential Chunked file upload

I am trying to find out the optimal way to upload a large file to the server.我试图找出将大文件上传到服务器的最佳方式。 On my local, I tried both methods:在本地,我尝试了两种方法:

  1. Sequential Chunked Upload, where file is broken down into multiple chunks and uploaded one at a time to server.顺序分块上传,其中文件被分解为多个块并一次上传一个到服务器。 On the server side, the incoming chunk is directly appended to the final file.在服务器端,传入的块直接附加到最终文件中。
  2. Parallel Chunked Upload, where the chunks are uploaded in parallel to the server.并行分块上传,其中分块并行上传到服务器。 The server stores the files as temp small files, and then merges them into one complete file when all the chunks are uploaded.服务器将文件存储为临时小文件,然后在所有块上传后将它们合并为一个完整文件。

But here the merging time is way more as compared to upload time, atleast on my local.但是这里的合并时间比上传时间要长得多,至少在我的本地是这样。 Hence, the parallel upload is always a lot slower than the sequential upload.因此,并行上传总是比顺序上传慢很多。 How can I improve my upload time?如何缩短上传时间?

Here is the Sequential Upload Code :这是顺序上传代码:

    [HttpPost]
    [RequestFormLimits(MultipartBodyLengthLimit = 209715200)]
    [RequestSizeLimit(209715200)]
    public async Task<IActionResult> UploadSeq()
    {
        string filename = "./Uploads/seqfile";
        using(Stream file = Request.Body)
        using (FileStream stream = new FileStream(filename, FileMode.Append))
        {
            await file.CopyToAsync(stream);
            await stream.FlushAsync();
        }
        return Ok();
    }

And here is the parallel code :这是并行代码:

    [HttpPost("/ParUpload/{id}/{size}")]
    [RequestFormLimits(MultipartBodyLengthLimit = 209715200)]
    [RequestSizeLimit(209715200)]
    public async Task<IActionResult> UploadParallel(int id, int size)
    {
        string filename = "./Uploads/parfilecomplete.mkv";
        using (Stream file = Request.Body)
        using (FileStream stream = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write))
        {
            stream.Seek(id * size, SeekOrigin.Begin);
            await file.CopyToAsync(stream);
            await stream.FlushAsync();
        }
        return Ok();
    }

For large files, the parallel upload is preferred.对于大文件,首选并行上传。 But there are few things to consider但有几件事需要考虑

  1. When separating a file into many small ones make sure you don't upload them all at once.将文件分成许多小文件时,请确保不要一次上传所有文件。 Make sure there are at most ~50 parallel uploads.确保最多有大约 50 个并行上传。 The number may vary of course but for sure if you try to upload 10000 files at once it won't work well.当然,数量可能会有所不同,但可以肯定的是,如果您尝试一次上传 10000 个文件,它将无法正常工作。 Use some kind of queue of active uploads.使用某种活动上传队列。
  2. Make sure that the server supports the parallel download on their side.确保服务器支持他们这边的并行下载。

Parallel for the win, but you have to do extra work if you want to develop such system.并行取胜,但如果你想开发这样的系统,你必须做额外的工作。

  1. You should not create more than 6 connections per domain (FQDN) since there are connection limits in place on client side.您不应为每个域 (FQDN) 创建超过 6 个连接,因为客户端存在连接限制。 You can increase the limit by using same ip but different domain names like upload1.domain.com , upload2.domain.com , etc.. but then you will probably fill the bandwidth.您可以通过使用相同的 ip 但不同的域名(如upload1.domain.comupload2.domain.com等)来增加限制。但是你可能会填满带宽。

  2. You should not open and close file every time a part is received, it is a very expensive operation.您不应在每次收到零件时打开和关闭文件,这是一项非常昂贵的操作。 Instead create a writer class that open the file and you queue received part and the location to write, and writer will write when it can, and closes the file when it writes all parts.而是创建一个打开文件的编写器类,并将接收到的部分和要写入的位置排队,编写器将在可能时写入,并在写入所有部分时关闭文件。 You would trade disk latency with memory.你会用内存来交换磁盘延迟。

  3. If you can enable TCP fast start.如果可以启用TCP快速启动。 This will let you send data to your server faster这将使您可以更快地将数据发送到您的服务器

  4. Choose your buffer size wisely.明智地选择缓冲区大小。 Try with different sizes where it requires minimal disk IO and even less network IO.在需要最少磁盘 IO 甚至更少网络 IO 的情况下尝试使用不同的大小。 Buffer size must a multiple of disk sector size.缓冲区大小必须是磁盘扇区大小的倍数。

These are what comes to my mind from my previous experience.这些是我从以前的经历中想到的。 I do not remember exact details but that items should help.我不记得确切的细节,但这些项目应该会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM