Memory C# 中的 byte[] 分片

Question

The C#/.NET application I am working on makes use of huge byte arrays and is having memory fragmentation issues.我正在开发的 C#/.NET 应用程序使用大字节 arrays 并且存在 memory 碎片问题。 Checked memory usage using CLRMemory使用 CLRMemory 检查 memory 使用情况

请参考 LOH 和可用空间的图像

The Code we use is as follows我们使用的代码如下

PdfLoadedDocument loadedDocument = new PdfLoadedDocument("myLoadedDocument.pdf");

// Operations on pdf document

using (var stream = new MemoryStream())
{
    loadedDocument.Save(stream);
    loadedDocument.Close(true);
    return stream.ToArray(); //byte[]
}

And we use similar code at multiple places across our application and we call this in loop for generating bulk audits ranging from a few 100's to 10000's我们在整个应用程序的多个位置使用类似的代码，我们将其称为循环以生成从几百到 10000 的批量审计

Now is there a better way to handle this to avoild fragmentation现在有没有更好的方法来处理这个以避免碎片

And as part of audits, we also download large files from Amazon S3 using the following code作为审计的一部分，我们还使用以下代码从 Amazon S3 下载大文件

using (var client = new AmazonS3Client(_accessKey, _secretKey, _region))
{
   var getObjectRequest = new GetObjectRequest();
   getObjectRequest.BucketName = "bucketName";
   getObjectRequest.Key = "keyName";

   using (var downloadStream = new MemoryStream())
   {
       using (var response = await client.GetObjectAsync(getObjectRequest))
       {
           using (var responseStream = response.ResponseStream)
           {
               await responseStream.CopyToAsync(downloadStream);
           }
           return downloadStream.ToArray(); //byte[]
       }
   }
}

Is there a better alternative to download large files without them moving to LOH which is taking a toll with Garbage Collector有没有更好的选择来下载大文件而不将它们移动到 LOH，这对垃圾收集器造成了损失

Answer 1

There's two different things here:这里有两个不同的东西：

the internals of MemoryStream MemoryStream的内部结构
the usage of .ToArray() .ToArray()的用法

For what happens inside MemoryStream : it is implemented as a simple byte[] , but you can mitigate a lot of the overhead of that by using RecyclableMemoryStream instead via the Microsoft.IO.RecyclableMemoryStream nuget package, which re-uses buffers between independent usages. For what happens inside MemoryStream : it is implemented as a simple byte[] , but you can mitigate a lot of the overhead of that by using RecyclableMemoryStream instead via the Microsoft.IO.RecyclableMemoryStream nuget package, which re-uses buffers between independent usages.

For ToArray() , frankly: don't do that .对于ToArray() ，坦率地说：不要那样做。 When using vanilla MemoryStream , the better approach is TryGetBuffer(...) , which gives you the oversized backing buffer, along with the start/end tokens:使用 vanilla MemoryStream时，更好的方法是TryGetBuffer(...) ，它为您提供超大的后备缓冲区以及开始/结束标记：

if (!memStream.TryGetBuffer(out var segment))
    throw new InvalidOperationException("Unable to obtain data segment; oops?");
// see segment.Offset, .Count, and .Array

It is then your job to not look outside those bounds .然后，您的工作就是不要超出这些界限。 If you want to make that easier: consider treating the segment as a span (or memory) instead:如果您想让这更容易：考虑将段视为跨度（或内存）：

ReadOnlySpan<byte> muchSafer = segment;
// now you can't read out of bounds, and you don't need to apply the offset yourself

This TryGetBuffer(...) approach, however, does not work well with RecyclableMemoryStream - as it makes a defensive copy to prevent problems with independent data;但是，这种TryGetBuffer(...)方法不能很好地与RecyclableMemoryStream配合使用——因为它会生成防御性副本以防止独立数据出现问题； in that scenario, you should treat the stream simply as a stream , ie Stream - just write to it, rewind it ( Position = 0 ), and have the consumer read from it, then dispose it when they are done. in that scenario, you should treat the stream simply as a stream , ie Stream - just write to it, rewind it ( Position = 0 ), and have the consumer read from it, then dispose it when they are done.

As a side note: when reading (or writing) using the Stream API: consider using the array-pool for your scratch buffers;附带说明：使用Stream API 读取（或写入）时：考虑将数组池用于暂存缓冲区； so instead of:所以而不是：

var buffer = new byte[1024];
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{...}

instead try:而是尝试：

var buffer = ArrayPool<byte>.Shared.Rent(1024);
try
{
    int bytesRead;
    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
    {...}
}
finally
{
    ArrayPool<byte>.Shared.Return(buffer);
}

In more advanced scenarios, it may be wise to use the pipelines API rather than the stream API;在更高级的场景中，使用管道API 而不是stream API 可能是明智的； the point here is that pipelines allows discontiguous buffers, so you never need ridiculously large buffers even when dealing with complex scenarios.这里的重点是管道允许不连续的缓冲区，因此即使在处理复杂场景时也不需要大得离谱的缓冲区。 This is a niche API, however, and has very limited support in public APIs.然而，这是一个利基 API，在公共 API 中的支持非常有限。

Memory C# 中的 byte[] 分片

问题描述

1 个解决方案

解决方案1
2 2021-03-22 11:50:01

Memory C# 中的 byte[] 分片

问题描述

1 个解决方案

解决方案1 2 2021-03-22 11:50:01

解决方案1
2 2021-03-22 11:50:01