简体   繁体   中英

Parallel Chunked vs Sequential Chunked file upload

I am trying to find out the optimal way to upload a large file to the server. On my local, I tried both methods:

  1. Sequential Chunked Upload, where file is broken down into multiple chunks and uploaded one at a time to server. On the server side, the incoming chunk is directly appended to the final file.
  2. Parallel Chunked Upload, where the chunks are uploaded in parallel to the server. The server stores the files as temp small files, and then merges them into one complete file when all the chunks are uploaded.

But here the merging time is way more as compared to upload time, atleast on my local. Hence, the parallel upload is always a lot slower than the sequential upload. How can I improve my upload time?

Here is the Sequential Upload Code :

    [HttpPost]
    [RequestFormLimits(MultipartBodyLengthLimit = 209715200)]
    [RequestSizeLimit(209715200)]
    public async Task<IActionResult> UploadSeq()
    {
        string filename = "./Uploads/seqfile";
        using(Stream file = Request.Body)
        using (FileStream stream = new FileStream(filename, FileMode.Append))
        {
            await file.CopyToAsync(stream);
            await stream.FlushAsync();
        }
        return Ok();
    }

And here is the parallel code :

    [HttpPost("/ParUpload/{id}/{size}")]
    [RequestFormLimits(MultipartBodyLengthLimit = 209715200)]
    [RequestSizeLimit(209715200)]
    public async Task<IActionResult> UploadParallel(int id, int size)
    {
        string filename = "./Uploads/parfilecomplete.mkv";
        using (Stream file = Request.Body)
        using (FileStream stream = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write))
        {
            stream.Seek(id * size, SeekOrigin.Begin);
            await file.CopyToAsync(stream);
            await stream.FlushAsync();
        }
        return Ok();
    }

For large files, the parallel upload is preferred. But there are few things to consider

  1. When separating a file into many small ones make sure you don't upload them all at once. Make sure there are at most ~50 parallel uploads. The number may vary of course but for sure if you try to upload 10000 files at once it won't work well. Use some kind of queue of active uploads.
  2. Make sure that the server supports the parallel download on their side.

Parallel for the win, but you have to do extra work if you want to develop such system.

  1. You should not create more than 6 connections per domain (FQDN) since there are connection limits in place on client side. You can increase the limit by using same ip but different domain names like upload1.domain.com , upload2.domain.com , etc.. but then you will probably fill the bandwidth.

  2. You should not open and close file every time a part is received, it is a very expensive operation. Instead create a writer class that open the file and you queue received part and the location to write, and writer will write when it can, and closes the file when it writes all parts. You would trade disk latency with memory.

  3. If you can enable TCP fast start. This will let you send data to your server faster

  4. Choose your buffer size wisely. Try with different sizes where it requires minimal disk IO and even less network IO. Buffer size must a multiple of disk sector size.

These are what comes to my mind from my previous experience. I do not remember exact details but that items should help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM