简体   繁体   中英

ContentHash not calculated in Azure Blob Storage v12

Continuing the saga, here is part I: ContentHash is null in Azure.Storage.Blobs v12.xx

After a lot of debugging, root cause appears to be that the content hash was not calculated after uploading a blob , therefore the BlobContentInfo or BlobProperties were returning a null content hash and my whole flow is based on receiving the hash from Azure.

What I've discovered is that it depends on which HttpRequest stream method I call and upload to azure:

HttpRequest.GetBufferlessInputStream() , the content hash is not calculated, even if I go into azure storage explorer, the ContentMD5 of the blob is empty.

HttpRequest.InputStream() everything works as expected.


Do you know why this different behavior? And do you know how to make to receive content hash for streams received by GetBufferlessInputStream method.

So the code flow looks like this:

var stream = HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true)

var container = _blobServiceClient.GetBlobContainerClient(containerName);
var blob = container.GetBlockBlobClient(blobPath);

BlobHttpHeaders blobHttpHeaders = null;
if (!string.IsNullOrWhiteSpace(fileContentType))
{
     blobHttpHeaders = new BlobHttpHeaders()
     {
          ContentType = fileContentType,
     };
}

// retry already configured of Azure Storage API
await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);

return await blob.GetPropertiesAsync();

In the code snippet from above ContentHash is NOT calculated, but if I change the way I am getting the stream from the http request with following snippet ContentHash is calculated.

var stream = HttpContext.Current.Request.InputStream

PS I think its obvious, but with the old sdk, content hash was calculated for streams received by GetBufferlessInputStream method

P.S2 you can find also an open issue on github: https://github.com/Azure/azure-sdk-for-net/issues/14037

P.S3 added code snipet

Ran into this today. From my digging, it appears this is a symptom of the type of Stream you use to upload, and it's not really a bug. In order to generate a hash for your blob (which is done on the client side before uploading by the looks of it), it needs to read the stream. Which means it would need to reset the position of your stream back to 0 (for the actual upload process) after generating the hash. Doing this requires the ability to perform the Seek operation on the stream. If your stream doesn't support Seek, then it looks like it doesn't generate the hash.

To get around the issue, make sure the stream you provide supports Seek ( CanSeek ). If it doesn't, then use a different Stream/copy your data to a stream that does (for example MemoryStream ). The alternative would be for the internals of the Blob SDK to do this for you.

A workaround is that when get the stream via GetBufferlessInputStream() method, convert it to MemoryStream , then upload the MemoryStream . Then it can generate the contenthash . Sample code like below:

        var stream111 = System.Web.HttpContext.Current.Request.GetBufferlessInputStream(disableMaxRequestLength: true);
        //convert to memoryStream.
        MemoryStream stream = new MemoryStream();
        stream111.CopyTo(stream);
        stream.Position = 0;

        //other code
        // retry already configured of Azure Storage API
        await blob.UploadAsync(stream, httpHeaders: blobHttpHeaders);

Not sure why, but as per my debug, I can see when using the method GetBufferlessInputStream() in the latest SDK, during upload, it actually calls the Put Block api in the backend. And in this api, MD5 hash is not stored with the blob(Refer to here for details.). Screenshot as below:

在此处输入图像描述

However, when using InputStream , it calls the Put Blob api. Screenshot as below:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM