[英]Memory Fragmentation with byte[] in C#
The C#/.NET application I am working on makes use of huge byte arrays and is having memory fragmentation issues.我正在开发的 C#/.NET 应用程序使用大字节 arrays 并且存在 memory 碎片问题。 Checked memory usage using CLRMemory使用 CLRMemory 检查 memory 使用情况
The Code we use is as follows我们使用的代码如下
PdfLoadedDocument loadedDocument = new PdfLoadedDocument("myLoadedDocument.pdf");
// Operations on pdf document
using (var stream = new MemoryStream())
{
loadedDocument.Save(stream);
loadedDocument.Close(true);
return stream.ToArray(); //byte[]
}
And we use similar code at multiple places across our application and we call this in loop for generating bulk audits ranging from a few 100's to 10000's我们在整个应用程序的多个位置使用类似的代码,我们将其称为循环以生成从几百到 10000 的批量审计
And as part of audits, we also download large files from Amazon S3 using the following code作为审计的一部分,我们还使用以下代码从 Amazon S3 下载大文件
using (var client = new AmazonS3Client(_accessKey, _secretKey, _region))
{
var getObjectRequest = new GetObjectRequest();
getObjectRequest.BucketName = "bucketName";
getObjectRequest.Key = "keyName";
using (var downloadStream = new MemoryStream())
{
using (var response = await client.GetObjectAsync(getObjectRequest))
{
using (var responseStream = response.ResponseStream)
{
await responseStream.CopyToAsync(downloadStream);
}
return downloadStream.ToArray(); //byte[]
}
}
}
There's two different things here:这里有两个不同的东西:
MemoryStream
MemoryStream
的内部结构.ToArray()
.ToArray()
的用法For what happens inside MemoryStream
: it is implemented as a simple byte[]
, but you can mitigate a lot of the overhead of that by using RecyclableMemoryStream
instead via the Microsoft.IO.RecyclableMemoryStream
nuget package, which re-uses buffers between independent usages. For what happens inside MemoryStream
: it is implemented as a simple byte[]
, but you can mitigate a lot of the overhead of that by using RecyclableMemoryStream
instead via the Microsoft.IO.RecyclableMemoryStream
nuget package, which re-uses buffers between independent usages.
For ToArray()
, frankly: don't do that .对于ToArray()
,坦率地说:不要那样做。 When using vanilla MemoryStream
, the better approach is TryGetBuffer(...)
, which gives you the oversized backing buffer, along with the start/end tokens:使用 vanilla MemoryStream
时,更好的方法是TryGetBuffer(...)
,它为您提供超大的后备缓冲区以及开始/结束标记:
if (!memStream.TryGetBuffer(out var segment))
throw new InvalidOperationException("Unable to obtain data segment; oops?");
// see segment.Offset, .Count, and .Array
It is then your job to not look outside those bounds .然后,您的工作就是不要超出这些界限。 If you want to make that easier: consider treating the segment as a span (or memory) instead:如果您想让这更容易:考虑将段视为跨度(或内存):
ReadOnlySpan<byte> muchSafer = segment;
// now you can't read out of bounds, and you don't need to apply the offset yourself
This TryGetBuffer(...)
approach, however, does not work well with RecyclableMemoryStream
- as it makes a defensive copy to prevent problems with independent data;但是,这种TryGetBuffer(...)
方法不能很好地与RecyclableMemoryStream
配合使用——因为它会生成防御性副本以防止独立数据出现问题; in that scenario, you should treat the stream simply as a stream , ie Stream
- just write to it, rewind it ( Position = 0
), and have the consumer read from it, then dispose it when they are done. in that scenario, you should treat the stream simply as a stream , ie Stream
- just write to it, rewind it ( Position = 0
), and have the consumer read from it, then dispose it when they are done.
As a side note: when reading (or writing) using the Stream
API: consider using the array-pool for your scratch buffers;附带说明:使用Stream
API 读取(或写入)时:考虑将数组池用于暂存缓冲区; so instead of:所以而不是:
var buffer = new byte[1024];
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{...}
instead try:而是尝试:
var buffer = ArrayPool<byte>.Shared.Rent(1024);
try
{
int bytesRead;
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{...}
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
In more advanced scenarios, it may be wise to use the pipelines API rather than the stream API;在更高级的场景中,使用管道API 而不是stream API 可能是明智的; the point here is that pipelines allows discontiguous buffers, so you never need ridiculously large buffers even when dealing with complex scenarios.这里的重点是管道允许不连续的缓冲区,因此即使在处理复杂场景时也不需要大得离谱的缓冲区。 This is a niche API, however, and has very limited support in public APIs.然而,这是一个利基 API,在公共 API 中的支持非常有限。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.