[英]C#: Download large sized json file from ADLS gen2 blob and Deserialize to object
I am using following code to output data from blob to stream.:我正在使用以下代码将数据从 blob 输出到流:
private static async Task<Stream> ParallelDownloadBlobAsync(Stream outPutStream, CloudBlockBlob blob)
{
await blob.FetchAttributesAsync();
int bufferLength = 1 * 1024 * 1024;//1 MB chunk
long blobRemainingLength = blob.Properties.Length;
Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
long offset = 0;
while (blobRemainingLength > 0)
{
long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
offset += chunkLength;
blobRemainingLength -= chunkLength;
}
Parallel.ForEach(queues, new ParallelOptions()
{
//Gets or sets the maximum number of concurrent tasks
MaxDegreeOfParallelism = 10
}, (queue) =>
{
using (var ms = new MemoryStream())
{
blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);
lock (outPutStream)
{
outPutStream.Position = queue.Key;
var bytes = ms.ToArray();
outPutStream.Write(bytes, 0, bytes.Length);
}
}
});
return outPutStream;
}
Then i have used JsonSerializer to deseriize data but while block is not executing然后我使用 JsonSerializer 来反序列化数据但是块没有执行
await ParallelDownloadBlobAsync(stream, cloudBlockBlob);
//resetting stream's position to 0
//stream.Position = 0;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(stream))
{
using (var jsonTextReader = new JsonTextReader(sr))
{
jsonTextReader.SupportMultipleContent = true;
result = new List<T>();
while (jsonTextReader.Read())
{
result.Add(serializer.Deserialize<T>(jsonTextReader));
}
}
}
If i use DownloadToStreamAsync instead of parallel download ( DownloadRangeToStreamAsync ) then it works.
如果我使用DownloadToStreamAsync而不是并行下载( DownloadRangeToStreamAsync ),那么它可以工作。
I can repro your issue, and the solution here is that in the ParallelDownloadBlobAsync
method, change this line of code blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);
我可以重现您的问题,这里的解决方案是在
ParallelDownloadBlobAsync
方法中,更改这行代码blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);
to blob.DownloadRangeToStream(ms, queue.Key, queue.Value);
到
blob.DownloadRangeToStream(ms, queue.Key, queue.Value);
Not sure if the same root cause of the issue for you and me.不确定您和我的问题的根本原因是否相同。 In my side, the root cause is that when the file is small(like 100kb), when using
blob.DownloadRangeToStreamAsync
method, the output stream is always 0, so the while condition
is never executed.在我这边,根本原因是当文件很小(比如 100kb)时,使用
blob.DownloadRangeToStreamAsync
方法时,输出流始终为 0,因此永远不会执行while condition
。 But for larger files, it's ok to use blob.DownloadRangeToStreamAsync
method.但是对于较大的文件,可以使用
blob.DownloadRangeToStreamAsync
方法。
Please leave a comment if it cannot resolve your issue.如果无法解决您的问题,请发表评论。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.