Memory C# 中的 byte[] 分片

Question

我正在開發的 C#/.NET 應用程序使用大字節 arrays 並且存在 memory 碎片問題。 使用 CLRMemory 檢查 memory 使用情況

請參考 LOH 和可用空間的圖像

我們使用的代碼如下

PdfLoadedDocument loadedDocument = new PdfLoadedDocument("myLoadedDocument.pdf");

// Operations on pdf document

using (var stream = new MemoryStream())
{
    loadedDocument.Save(stream);
    loadedDocument.Close(true);
    return stream.ToArray(); //byte[]
}

我們在整個應用程序的多個位置使用類似的代碼，我們將其稱為循環以生成從幾百到 10000 的批量審計

現在有沒有更好的方法來處理這個以避免碎片

作為審計的一部分，我們還使用以下代碼從 Amazon S3 下載大文件

using (var client = new AmazonS3Client(_accessKey, _secretKey, _region))
{
   var getObjectRequest = new GetObjectRequest();
   getObjectRequest.BucketName = "bucketName";
   getObjectRequest.Key = "keyName";

   using (var downloadStream = new MemoryStream())
   {
       using (var response = await client.GetObjectAsync(getObjectRequest))
       {
           using (var responseStream = response.ResponseStream)
           {
               await responseStream.CopyToAsync(downloadStream);
           }
           return downloadStream.ToArray(); //byte[]
       }
   }
}

有沒有更好的選擇來下載大文件而不將它們移動到 LOH，這對垃圾收集器造成了損失

Answer 1

這里有兩個不同的東西：

MemoryStream的內部結構
.ToArray()的用法

For what happens inside MemoryStream : it is implemented as a simple byte[] , but you can mitigate a lot of the overhead of that by using RecyclableMemoryStream instead via the Microsoft.IO.RecyclableMemoryStream nuget package, which re-uses buffers between independent usages.

對於ToArray() ，坦率地說：不要那樣做。 使用 vanilla MemoryStream時，更好的方法是TryGetBuffer(...) ，它為您提供超大的后備緩沖區以及開始/結束標記：

if (!memStream.TryGetBuffer(out var segment))
    throw new InvalidOperationException("Unable to obtain data segment; oops?");
// see segment.Offset, .Count, and .Array

然后，您的工作就是不要超出這些界限。 如果您想讓這更容易：考慮將段視為跨度（或內存）：

ReadOnlySpan<byte> muchSafer = segment;
// now you can't read out of bounds, and you don't need to apply the offset yourself

但是，這種TryGetBuffer(...)方法不能很好地與RecyclableMemoryStream配合使用——因為它會生成防御性副本以防止獨立數據出現問題； in that scenario, you should treat the stream simply as a stream , ie Stream - just write to it, rewind it ( Position = 0 ), and have the consumer read from it, then dispose it when they are done.

附帶說明：使用Stream API 讀取（或寫入）時：考慮將數組池用於暫存緩沖區； 所以而不是：

var buffer = new byte[1024];
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{...}

而是嘗試：

var buffer = ArrayPool<byte>.Shared.Rent(1024);
try
{
    int bytesRead;
    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
    {...}
}
finally
{
    ArrayPool<byte>.Shared.Return(buffer);
}

在更高級的場景中，使用管道API 而不是stream API 可能是明智的； 這里的重點是管道允許不連續的緩沖區，因此即使在處理復雜場景時也不需要大得離譜的緩沖區。 然而，這是一個利基 API，在公共 API 中的支持非常有限。

Memory C# 中的 byte[] 分片

問題描述

1 個解決方案

解決方案1
2 2021-03-22 11:50:01

Memory C# 中的 byte[] 分片

問題描述

1 個解決方案

解決方案1 2 2021-03-22 11:50:01

解決方案1
2 2021-03-22 11:50:01