简体   繁体   中英

C#: Download large sized json file from ADLS gen2 blob and Deserialize to object

I am using following code to output data from blob to stream.:

    private static async Task<Stream> ParallelDownloadBlobAsync(Stream outPutStream, CloudBlockBlob blob)
    {

        await blob.FetchAttributesAsync();
        int bufferLength = 1 * 1024 * 1024;//1 MB chunk
        long blobRemainingLength = blob.Properties.Length;
        Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
        long offset = 0;
        while (blobRemainingLength > 0)
        {
            long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
            queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
            offset += chunkLength;
            blobRemainingLength -= chunkLength;
        }
        Parallel.ForEach(queues, new ParallelOptions()
        {
            //Gets or sets the maximum number of concurrent tasks
            MaxDegreeOfParallelism = 10
        }, (queue) =>
        {
            using (var ms = new MemoryStream())
            {
                blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);
                lock (outPutStream)
                {
                    outPutStream.Position = queue.Key;
                    var bytes = ms.ToArray();
                    outPutStream.Write(bytes, 0, bytes.Length);
                }
            }
        });

        return outPutStream;
    }

Then i have used JsonSerializer to deseriize data but while block is not executing

 await ParallelDownloadBlobAsync(stream, cloudBlockBlob);

                //resetting stream's position to 0

                //stream.Position = 0;
                var serializer = new JsonSerializer();

                    using (var sr = new StreamReader(stream))
                    {
                        using (var jsonTextReader = new JsonTextReader(sr))
                        {
                            jsonTextReader.SupportMultipleContent = true;
                            result = new List<T>();


                            while (jsonTextReader.Read())
                            {
                                result.Add(serializer.Deserialize<T>(jsonTextReader));
                            }

                        }
                    }

If i use DownloadToStreamAsync instead of parallel download ( DownloadRangeToStreamAsync ) then it works.

I can repro your issue, and the solution here is that in the ParallelDownloadBlobAsync method, change this line of code blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value); to blob.DownloadRangeToStream(ms, queue.Key, queue.Value);

Not sure if the same root cause of the issue for you and me. In my side, the root cause is that when the file is small(like 100kb), when using blob.DownloadRangeToStreamAsync method, the output stream is always 0, so the while condition is never executed. But for larger files, it's ok to use blob.DownloadRangeToStreamAsync method.

Please leave a comment if it cannot resolve your issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM