简体   繁体   English

C#:从 ADLS gen2 blob 下载大型 json 文件并反序列化为对象

[英]C#: Download large sized json file from ADLS gen2 blob and Deserialize to object

I am using following code to output data from blob to stream.:我正在使用以下代码将数据从 blob 输出到流:

    private static async Task<Stream> ParallelDownloadBlobAsync(Stream outPutStream, CloudBlockBlob blob)
    {

        await blob.FetchAttributesAsync();
        int bufferLength = 1 * 1024 * 1024;//1 MB chunk
        long blobRemainingLength = blob.Properties.Length;
        Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
        long offset = 0;
        while (blobRemainingLength > 0)
        {
            long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
            queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
            offset += chunkLength;
            blobRemainingLength -= chunkLength;
        }
        Parallel.ForEach(queues, new ParallelOptions()
        {
            //Gets or sets the maximum number of concurrent tasks
            MaxDegreeOfParallelism = 10
        }, (queue) =>
        {
            using (var ms = new MemoryStream())
            {
                blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);
                lock (outPutStream)
                {
                    outPutStream.Position = queue.Key;
                    var bytes = ms.ToArray();
                    outPutStream.Write(bytes, 0, bytes.Length);
                }
            }
        });

        return outPutStream;
    }

Then i have used JsonSerializer to deseriize data but while block is not executing然后我使用 JsonSerializer 来反序列化数据但是块没有执行

 await ParallelDownloadBlobAsync(stream, cloudBlockBlob);

                //resetting stream's position to 0

                //stream.Position = 0;
                var serializer = new JsonSerializer();

                    using (var sr = new StreamReader(stream))
                    {
                        using (var jsonTextReader = new JsonTextReader(sr))
                        {
                            jsonTextReader.SupportMultipleContent = true;
                            result = new List<T>();


                            while (jsonTextReader.Read())
                            {
                                result.Add(serializer.Deserialize<T>(jsonTextReader));
                            }

                        }
                    }

If i use DownloadToStreamAsync instead of parallel download ( DownloadRangeToStreamAsync ) then it works.如果我使用DownloadToStreamAsync而不是并行下载( DownloadRangeToStreamAsync ),那么它可以工作。

I can repro your issue, and the solution here is that in the ParallelDownloadBlobAsync method, change this line of code blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);我可以重现您的问题,这里的解决方案是在ParallelDownloadBlobAsync方法中,更改这行代码blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value); to blob.DownloadRangeToStream(ms, queue.Key, queue.Value);blob.DownloadRangeToStream(ms, queue.Key, queue.Value);

Not sure if the same root cause of the issue for you and me.不确定您和我的问题的根本原因是否相同。 In my side, the root cause is that when the file is small(like 100kb), when using blob.DownloadRangeToStreamAsync method, the output stream is always 0, so the while condition is never executed.在我这边,根本原因是当文件很小(比如 100kb)时,使用blob.DownloadRangeToStreamAsync方法时,输出流始终为 0,因此永远不会执行while condition But for larger files, it's ok to use blob.DownloadRangeToStreamAsync method.但是对于较大的文件,可以使用blob.DownloadRangeToStreamAsync方法。

Please leave a comment if it cannot resolve your issue.如果无法解决您的问题,请发表评论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM