简体繁体 English

了解内存性能计数器

[英]Understanding Memory Performance Counters

原文 2010-09-21 08:17:11 7 3 c#/ performance/ performancecounter

[Update - Sep 30, 2010] [更新 - 2010年9月30日]

Since I studied a lot on this & related topics, I'll write whatever tips I gathered out of my experiences and suggestions provided in answers over here- 由于我对这个及相关主题进行了很多研究，我会写出我从我的经验和建议中收集的任何提示，这些提示在这里给出了答案 -

1) Use memory profiler (try CLR Profiler, to start with) and find the routines which consume max mem and fine tune them, like reuse big arrays, try to keep references to objects to minimal. 1）使用内存分析器（尝试CLR Profiler，开始）并找到消耗max mem并对其进行微调的例程，如重用大数组，尝试将对象的引用保持在最小。

2) If possible, allocate small objects (less than 85k for .NET 2.0) and use memory pools if you can to avoid high CPU usage by garbage collector. 2）如果可能，分配小对象（对于.NET 2.0少于85k）并使用内存池，如果可以避免垃圾收集器的高CPU使用率。

3) If you increase references to objects, you're responsible to de-reference them the same number of times. 3）如果增加对象的引用，则负责将它们取消引用相同的次数。 You'll have peace of mind and code probably will work better. 你会安心，代码可能会更好。

4) If nothing works and you are still clueless, use elimination method (comment/skip code) to find out what is consuming most memory. 4）如果没有任何作用且您仍然无能为力，请使用消除方法（注释/跳过代码）来找出消耗最多内存的内容。

Using memory performance counters inside your code might also help you. 在代码中使用内存性能计数器也可能对您有所帮助。

Hope these help! 希望这些帮助！

[Original question] [原始问题]

Hi! 嗨！

I'm working in C#, and my issue is out of memory exception. 我在C＃工作，我的问题是内存不足异常。

I read an excellent article on LOH here -> http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/ 我在这里读了一篇关于LOH的优秀文章 - > http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/

Awesome read! 太棒了！

And, http://dotnetdebug.net/2005/06/30/perfmon-your-debugging-buddy/ 并且， http：//dotnetdebug.net/2005/06/30/perfmon-your-debugging-buddy/

My issue: 我的问题：

I am facing out of memory issue in an enterprise level desktop application. 我在企业级桌面应用程序中遇到内存不足问题。 I tried to read and understand stuff about memory profiling and performance counter (tried WinDBG also! - little bit) but am still clueless about basic stuff. 我试着阅读并理解有关内存分析和性能计数器的内容（尝试过WinDBG！ - 一点点）但我仍然对基本内容毫无头绪。

I tried CLR profiler to analyze the memory usage. 我尝试使用CLR分析器来分析内存使用情况。 It was helpful in: 它有助于：

Showing me who allocated huge chunks of memory 告诉我谁分配了大量的内存
What data type used maximum memory 什么数据类型使用最大内存

But, both, CLR Profiler and Performance Counters (since they share same data), failed to explain: 但是，CLR Profiler和性能计数器（因为它们共享相同的数据）都无法解释：

The numbers that is collected after each run of the app - how to understand if there is any improvement?!?! 每次运行应用程序后收集的数字 - 如何理解是否有任何改进？！？！
How do I compare the performance data after each run - is lower/higher number of a particular counter good or bad? 如何比较每次运行后的性能数据 - 特定计数器的优先级是低还是高？

What I need: 我需要的：

I am looking for the tips on: 我正在寻找以下提示：

How to free (yes, right) managed data type objects (like arrays, big strings) - but not by making GC.Collect calls, if possible. 如何释放（是，正确）托管数据类型对象（如数组，大字符串） - 但如果可能的话，不要通过进行GC.Collect调用。 I have to handle arrays of bytes of length like 500KB (unavoidable size :-( ) every now and then. 我必须时不时地处理长度为500KB（不可避免的大小:-(）的字节数组。
If fragmentation occurs, how to compact memory - as it seems that .NET GC is not really effectively doing that and causing OOM. 如果发生碎片，如何压缩内存 - 因为看起来.NET GC并没有真正有效地做到这一点并导致OOM。
Also, what exactly is 85KB limit for LOH? 另外，LOH究竟有85KB的限制？ Is this the size of the object of the overall size of the array? 这是数组整体大小的对象大小吗？ This is not very clear to me. 这对我来说不是很清楚。
What memory counters can tell if code changes are actually reducing the chances of OOM? 哪些内存计数器可以判断代码更改是否实际上减少了OOM的可能性？

Tips I already know 提示我已经知道了

Set managed objects to null - mark them garbage - so that garbage collector can collect them. 将托管对象设置为null - 将它们标记为垃圾 - 以便垃圾收集器可以收集它们。 This is strange - after setting a string[] object to null, the # bytes in all Heaps shot up! 这很奇怪 - 在将string []对象设置为null之后， 所有Heaps中的＃个字节都会出现！
Avoid creating objects/arrays > 85KB - this is not in my control. 避免创建> 85KB的对象/数组 - 这不在我的控制范围内。 So, there could be lots of LOH. 所以，可能会有很多LOH。

3. 3。

Memory Leaks Indicators:

# bytes in all Heaps increasing
Gen 2 Heap Size increasing
# GC handles increasing
# of Pinned Objects increasing
# total committed Bytes increasing
# total reserved Bytes increasing
Large Object Heap increasing

My situation: 我的情况：

I have got 4 GB, 32-bit machine with Wink 2K3 server SP2 on it. 我有4 GB，32位机器，上面有Wink 2K3服务器SP2。
I understand that an application can use <= 2 GB of physical RAM 我知道应用程序可以使用<= 2 GB的物理RAM
Increasing the Virtual Memory (pagefile) size has no effect in this scenario. 增加虚拟内存（页面文件）大小在此方案中无效。

As its OOM issue, I am only focusing on memory related counters only. 作为其OOM问题，我只关注与内存相关的计数器。

Please advice! 请指教！ I really need some help as I'm stuck because of lack of good documentation! 由于缺乏良好的文档，我真的需要一些帮助。

3 个解决方案

You could try pooling and managing the large objects yourself. 您可以尝试自己汇集和管理大型对象。 For example, if you often need <500k arrays and the number of arrays alive at once is well understood, you could avoid deallocating them ever--that way if you only need, say, 10 of them at a time, you could suffer a fixed 5mb memory overhead instead of troublesome long-term fragmentation. 例如，如果你经常需要<500k数组并且一次活动的数组很容易被理解，那么你可以避免将它们解除分配 - 这样如果你一次只需要10个数组，你可能会受到影响固定5mb内存开销而不是麻烦的长期碎片。

As for your three questions: 至于你的三个问题：

Is just not possible. 是不可能的。 Only the garbage collector decides when to finalize managed objects and release their memory. 只有垃圾收集器决定何时完成托管对象并释放内存。 That's part of what makes them managed objects. 这是使他们成为托管对象的部分原因。
This is possible if you manage your own heap in unsafe code and bypass the large object heap entirely. 如果您在不安全的代码中管理自己的堆并完全绕过大对象堆，则可以执行此操作。 You will end up doing a lot of work and suffering a lot of inconvenience if you go down this road. 如果你走这条路，你最终会做很多工作并且会带来很多不便。 I doubt that it's worth it for you. 我怀疑它对你来说是值得的。
It's the size of the object, not the number of elements in the array. 它是对象的大小，而不是数组中元素的数量。

Remember, fragmentation only happens when objects are freed, not when they're allocated. 请记住，碎片仅在释放对象时发生，而不是在分配对象时发生。 If fragmentation is indeed your problem, reusing the large objects will help. 如果碎片确实是你的问题，重复使用大对象将有所帮助。 Focus on creating less garbage (especially large garbage) over the lifetime of the app instead of trying to deal with the nuts and bolts of the gc implementation directly. 专注于在应用程序的生命周期中创建更少的垃圾（特别是大型垃圾），而不是直接尝试处理gc实现的细节。

Nayan, here are the answers to your questions, and a couple of additional advices. Nayan，这里是您的问题的答案，以及一些额外的建议。

You cannot free them, you can only make them easier to be collected by GC. 你不能释放它们，你只能让它们更容易被GC收集。 Seems you already know the way:the key is reducing the number of references to the object. 似乎你已经知道了方法：关键是减少对象的引用数量。
Fragmentation is one more thing which you cannot control. 碎片化是你无法控制的另一件事。 But there are several factors which can influence this: 但是有几个因素可以影响这个：
- LOH external fragmentation is less dangerous than Gen2 external fragmentation, 'cause LOH is not compacted. LOH外部碎裂比Gen2外部碎裂危险性小，因为LOH没有压实。 The free slots of LOH can be reused instead. LOH的空闲插槽可以重复使用。
- If the 500Kb byte arrays are referring to are used as some IO buffers (eg passed to some socket-based API or unmanaged code), there are high chances that they will get pinned. 如果引用的500Kb字节数组用作某些IO缓冲区（例如传递给某些基于套接字的API或非托管代码），则它们很可能会被固定。 A pinned object cannot be compacted by GC, and they are one of the most frequent reasons of heap fragmentation. 固定对象不能由GC压缩，它们是堆碎片最常见的原因之一。
- 85K is a limit for an object size. 85K是对象大小的限制。 But remember, System.Array instance is an object too, so all your 500K byte[] are in LOH. 但请记住，System.Array实例也是一个对象，因此所有500K字节[]都在LOH中。
- All counters that are in your post can give a hint about changes in memory consumption, but in your case I would select BIAH (Bytes in all heaps) and LOH size as primary indicators. 您帖子中的所有计数器都可以提示内存消耗的变化，但在您的情况下，我会选择BIAH（所有堆中的字节数）和LOH大小作为主要指标。 BIAH show the total size of all managed heaps (Gen1 + Gen2 + LOH, to be precise, no Gen0 - but who cares about Gen0, right? :) ), and LOH is the heap where all large byte[] are placed. BIAH显示所有管理堆的总大小（Gen1 + Gen2 + LOH，确切地说，没有Gen0 - 但谁关心Gen0，对吧？:)），LOH是放置所有大字节[]的堆。

Advices: 建议：

Something that already has been proposed: pre-allocate and pool your buffers. 已经提出的建议：预先分配和汇集缓冲区。
A different approach which can be effective if you can use any collection instead of contigous array of bytes (this is not the case if the buffers are used in IO): implement a custom collection which internally will be composed of many smaller-sized arrays. 如果您可以使用任何集合而不是连续的字节数组（如果在IO中使用缓冲区则不是这种情况），则可以采用不同的方法：实现自定义集合，其内部将由许多较小的数组组成。 This is something similar to std::deque from C++ STL library. 这类似于来自C ++ STL库的std :: deque。 Since each individual array will be smaller than 85K, the whole collection won't get in LOH. 由于每个单独的阵列将小于85K，因此整个集合将不会进入LOH。 The advantage you can get with this approach is the following: LOH is only collected when a full GC happens. 使用此方法可以获得的优势如下：仅在完整GC发生时收集LOH。 If the byte[] in your application are not long-lived, and (if they were smaller in size) would get in Gen0 or Gen1 before being collected, this would make memory management for GC much easier, since Gen2 collection is much more heavyweight. 如果你的应用程序中的byte []不是长寿的，并且（如果它们的尺寸较小）将在收集之前进入Gen0或Gen1，这将使GC的内存管理更容易，因为Gen2集合更重要。
An advice on the testing & monitoring approach: in my experience, the GC behavior, memory footprint and other memory-related stuff need to be monitored for quite a long time to get some valid and stable data. 关于测试和监控方法的建议：根据我的经验，需要对GC行为，内存占用和其他与内存相关的内容进行相当长时间的监控，以获得一些有效且稳定的数据。 So each time you change something in the code, have a long enough test with monitoring the memory performance counters to see the impact of the change. 因此，每次更改代码中的某些内容时，请通过监视内存性能计数器进行足够长的测试，以查看更改的影响。
I would also recommend to take a look at % Time in GC counter, as it can be a good indicator of the effectiveness of memory management. 我还建议您查看GC计数器中的％Time，因为它可以很好地指示内存管理的有效性。 The larger this value is, the more time your application spends on GC routines instead of processing the requests from users or doing other 'useful' operations. 此值越大，应用程序花在GC例程上的时间就越多，而不是处理来自用户的请求或执行其他“有用”操作。 I cannot give advices for what absolute values of this counter indicate an issue, but I can share my experience for your reference: for the application I am working on, we usually treat % Time in GC higher than 20% as an issue. 我无法就此计数器的绝对值表示问题提出建议，但我可以分享我的经验供您参考：对于我正在处理的应用程序，我们通常将GC中的％Time时间视为高于20％的问题。

Also, it would be useful if you shared some values of memory-related perf counters of your application: Private bytes and Working set of the process, BIAH, Total committed bytes, LOH size, Gen0, Gen1, Gen2 size, # of Gen0, Gen1, Gen2 collections, % Time in GC. 此外，如果您共享应用程序的内存相关性能计数器的某些值将非常有用：进程的专用字节和工作集，BIAH，总提交字节数，LOH大小，Gen0，Gen1，Gen2大小，Gen0的数量， Gen1，Gen2集合，GC中的％时间。 This would help better understand your issue. 这有助于更好地了解您的问题。

Another indicator is watching Private Bytes vs. Bytes in all Heaps . 另一个指标是Bytes in all Heaps观看Private Bytes与Bytes in all Heaps 。 If Private Bytes increases faster than Bytes in all Heaps , you have an unmanaged memory leak. 如果Private Bytes Bytes in all Heaps增长速度快于Bytes in all Heaps ，则会出现非托管内存泄漏。 If 'Bytes in all Heaps` increases faster than 'Private Bytes' it is a managed leak. 如果“所有堆中的字节数”增加的速度超过“私有字节数”，那么它就是一个管理泄漏。

To correct something that @Alexey Nedilko said: 要纠正@Alexey Nedilko所说的话：

"LOH external fragmentation is less dangerous than Gen2 external fragmentation, 'cause LOH is not compacted. The free slots of LOH can be reused instead." “LOH外部碎片比Gen2外部碎片更不危险，因为LOH没有压实.LOH的空闲插槽可以重复使用。”

is absolutely incorrect . 是完全错误的 。 Gen2 is compacted which means there is never free space after a collection. Gen2被压缩，这意味着收集后永远不会有自由空间。 The LOH is NOT compacted (as he correctly mentions) and yes, free slots are reused. LOH没有被压缩（正如他正确提到的那样），是的，免费插槽被重复使用。 BUT if the free space is not contiguous to fit the requested allocation, then the segment size is increased - and can continue to grow and grow . 但是如果可用空间不是连续的以适合所请求的分配，则段大小会增加 - 并且可以继续增长和增长 。 So, you can end up with gaps in the LOH that are never filled. 因此，你可以最终得到LOH中从未填补的空白。 This is a common cause of OOMs and I've seen this in many memory dumps I've analyzed. 这是OOM的常见原因，我在很多内存转储中看到了这一点。

Though there are now methods in the GC API (as of .NET 4.51) that can be called to programatically compact the LOH, I strongly recommend to avoid this - if app performance is a concern. 虽然现在GC API中的方法（从.NET 4.51开始）可以调用以编程方式压缩LOH，但我强烈建议避免这种情况 - 如果应用程序性能是一个问题。 It is extremely expensive to perform this operation at runtime and and hurt your app performance significantly. 在运行时执行此操作非常昂贵，并且会显着损害您的应用程序性能。 The reason that the default implementation of the GC was to be performant which is why they omitted this step in the first place. GC的默认实现是高效的原因，这就是为什么他们首先省略了这一步骤。 IMO, if you find that you have to call this because of LOH fragmentation, you are doing something wrong in your app - and it can be improved with pooling techniques, splitting arrays, and other memory allocation tricks instead. IMO，如果你发现由于LOH碎片你必须调用它，你在你的应用程序中做错了 - 它可以通过池化技术，拆分数组和其他内存分配技巧来改进。 If this app is an offline app or some batch process where performance isn't a big deal, maybe it's not so bad but I'd use it sparingly at best. 如果这个应用程序是一个离线应用程序或一些性能不是很大的批处理过程，也许它并不是那么糟糕，但我会充分利用它。