Upload blocks in parallel in blob storage

Question

I am trying to convert this to parallel to improve the upload times of a file but with what I have tried it has not had great changes in time. I want to upload the blocks side-by-side and then confirm them. How could I manage to do it in parallel?

public static async Task UploadInBlocks
    (BlobContainerClient blobContainerClient, string localFilePath, int blockSize)
{
    string fileName = Path.GetFileName(localFilePath);
    BlockBlobClient blobClient = blobContainerClient.GetBlockBlobClient(fileName);

    FileStream fileStream = File.OpenRead(localFilePath);

    ArrayList blockIDArrayList = new ArrayList();

    byte[] buffer;

    var bytesLeft = (fileStream.Length - fileStream.Position);

    while (bytesLeft > 0)
    {
        if (bytesLeft >= blockSize)
        {
            buffer = new byte[blockSize];
            await fileStream.ReadAsync(buffer, 0, blockSize);
        }
        else
        {
            buffer = new byte[bytesLeft];
            await fileStream.ReadAsync(buffer, 0, Convert.ToInt32(bytesLeft));
            bytesLeft = (fileStream.Length - fileStream.Position);
        }

        using (var stream = new MemoryStream(buffer))
        {
            string blockID = Convert.ToBase64String
                (Encoding.UTF8.GetBytes(Guid.NewGuid().ToString()));
            
            blockIDArrayList.Add(blockID);


            await blobClient.StageBlockAsync(blockID, stream);
        }

        bytesLeft = (fileStream.Length - fileStream.Position);

    }

    string[] blockIDArray = (string[])blockIDArrayList.ToArray(typeof(string));

    await blobClient.CommitBlockListAsync(blockIDArray);
}

Answer 1

Of course. You shouldn't expect any improvements - quite the opposite. Blob storage doesn't have any simplistic throughput throttling that would benefit from uploading in multiple streams, and you're already doing extremely light-weight I/O which is going to be entirely I/O bound.

Good I/O code has absolutely no benefits from parallelization. No matter how many workers you put on the job, the pipe is only this thick and will not allow you to pass more data through.

All your code just reimplements the already very efficient mechanisms that the blob storage library has... and you do it considerably worse, with pointless allocation, wrong arguments and new opportunities for bugs. Don't do that. The library can deal with streams just fine.

Upload blocks in parallel in blob storage

Question

1 answers

solution1
2 2023-01-24 19:30:45

Upload blocks in parallel in blob storage

Question

1 answers

solution1 2 2023-01-24 19:30:45

solution1
2 2023-01-24 19:30:45