简体   繁体   English

来自队列的大对象堆和字符串对象

[英]Large Object Heap and String Objects coming from a queue

I have a windows console app that is supposed to run without restarts for days and months. 我有一个Windows控制台应用程序应该运行几天和几个月没有重新启动。 The app retrieves "work" from an MSMQ and process it. 该应用程序从MSMQ检索“工作”并进行处理。 There are 30 threads that process a work chunk simultaneously. 有30个线程同时处理工作块。

Each work chunk coming from the MSMQ is approximately 200kb most of which is allocated in a single String object. 来自MSMQ的每个工作块大约为200kb,其中大部分分配在单个String对象中。

I have noticed that after processing about 3-4 thousands of these work chunks the memory consumption of the application is ridiculously high consuming 1 - 1.5 gb of memory. 我注意到,在处理了大约3-4千个这些工作块之后,应用程序的内存消耗非常高,消耗了1到1.5 GB的内存。

I run the app through a profiler and noticed that most of this memory (maybe a gig or so) is unused in the large object heap but the structure is fragmented. 我通过分析器运行应用程序并注意到大部分内存(可能是一个演出)在大对象堆中未使用但结构是碎片化的。

I have found that 90% of these unused (garbage collected) bytes were previously allocated String. 我发现这些未使用的(垃圾收集)字节中有90%是先前分配的字符串。 I started suspecting then that the strings coming in from the MSMQ were allocated, used and then deallocated and are therefore the cause of the fragmentation. 我开始怀疑从MSMQ进来的字符串是分配,使用然后解除分配,因此是碎片的原因。

I understand that things like GC.Collect(2 or GC.Max...) wont help since they gc the large object heap but don't compact it (which is the problem here). 我明白像GC.Collect(2或GC.Max ...)这样的东西不会有用,因为它们是gc大对象堆但不压缩它(这是问题)。 So I think that what I need is to cache these Strings and re-use them somehow but since Strings are immutable I would have to use StringBuilders. 所以我认为我需要的是缓存这些字符串并以某种方式重用它们,但由于字符串是不可变的,我将不得不使用StringBuilders。

My question is: Is there anyway to not change the underlying structure (ie using the MSMQ as this is something I cant change) and still avoid initializing a new String everytime to avoid fragmenting the LOH? 我的问题是:无论如何都没有改变底层结构(即使用MSMQ,因为这是我无法改变的),并且仍然避免每次初始化一个新的String以避免分裂LOH?

Thanks, Yannis 谢谢,Yannis

UPDATE: About how these "work" chunks are currently retrieved 更新:关于如何检索这些“工作”块

Currently these are stored as WorkChunk objects in the MSMQ. 目前,它们作为WorkChunk对象存储在MSMQ中。 Each of these objects contains a String called Contents and another String called Headers. 这些对象中的每一个都包含一个名为Contents的String和另一个名为Headers的String。 These are actual textual data. 这些是实际的文本数据。 I can change the storage structure to something else if needed and potentially the underlying storage mechanism if needed to something else than an MSMQ. 如果需要,我可以将存储结构更改为其他内容,如果需要,我可以将基础存储机制更改为MSMQ以外的其他内容。

On the worker nodes side currently we do 目前我们在工作节点方面

WorkChunk chunk = _Queue.Receive(); WorkChunk chunk = _Queue.Receive();

So there is little we can cache at this stage. 所以在这个阶段我们几乎无法缓存。 If we changed the structure(s) somehow then I suppose we could do a bit of progress. 如果我们以某种方式改变了结构,那么我想我们可以做一些进步。 In any case, we will have to sort out this problem so we will do whatever is needed to avoid throwing out months of work. 在任何情况下,我们都必须解决这个问题,以便我们做任何需要做的事情,以避免浪费数月的工作。

UPDATE: I went on to try some of the suggestions below and noticed that this issue cannot be reproduced on my local machine (running Windows 7 x64 and 64bit app). 更新:我继续尝试下面的一些建议,并注意到这个问题无法在我的本地计算机上运行(运行Windows 7 x64和64位应用程序)。 this makes things so much more difficult - if anyone knows why then it would really help repdocung this issue locally. 这使事情变得更加困难 - 如果有人知道为什么那么它真的有助于在本地重新调整这个问题。

Your problem appears to be due to memory allocation on the large object heap - the large object heap is not compacted and so can be a source of fragmentation. 您的问题似乎是由于大对象堆上的内存分配 - 大对象堆没有压缩,因此可能是碎片的来源。 There is a good article here that goes into more detail including some debugging steps that you can follow to confirm that fragmentation of the large object heap is happening: 这里有一篇很好的文章,详细介绍了一些调试步骤,您可以遵循这些步骤来确认大对象堆的碎片是否正在发生:

Large Object Heap Uncovered 大型物体堆未被覆盖

You appear to have two three solutions: 您似乎有 两个 三个解决方案:

  1. Alter your application to perform processing on chunks / shorter strings, where each chunk is smaller than 85,000 bytes - this avoids the allocation of large objects. 更改应用程序以对块/较短字符串执行处理,其中每个块小于85,000字节 - 这可以避免分配大对象。
  2. Alter your application to allocate a few large chunks of memory up-front and re-use those chunks by copying new messages into the allocated memory instead. 改变您的应用程序,预先分配几个大块内存,然后通过将新消息复制到分配的内存中来重新使用这些块。 See Heap fragmentation when using byte arrays . 使用字节数组时请参阅堆碎片
  3. Leave things as they are - As long as you don't experience out of memory exceptions and the application isn't interfering with other applications running on the system you should probably leave things as they are. 保持原样 - 只要您没有遇到内存不足异常且应用程序没有干扰系统上运行的其他应用程序,您应该保留原样。

Its important here to understand the distinction between virtual memory and physical memory - even though the process is using a large amount of virtual memory, if the number of objects allocated is relatively low then it cam be that the physical memory use of that process is low (the un-used memory is paged to disk) meaning little impact on other processes on the system. 理解虚拟内存和物理内存之间的区别非常重要 - 即使进程使用大量虚拟内存,如果分配的对象数量相对较少,那么该进程的物理内存使用率也很低(未使用的内存被分页到磁盘)意味着对系统上的其他进程几乎没有影响。 You may also find that the "VM Hoarding" option helps - read "Large Object Heap Uncovered" article for more information. 您还可能会发现“VM Hoarding”选项有助于 - 阅读“Large Object Heap Uncovered”文章以获取更多信息。

Either change involves changing your application to perform either some or all of its processing using byte arrays and short substrings instead of a single large string - how difficult this is going to be for you will depend on what sort of processing it is that you are doing. 更改涉及更改应用程序以使用字节数组和短子串而不是单个大字符串执行其部分或全部处理 - 这对您来说有多困难将取决于您正在进行的处理类型。

When there is fragmentation on the LOH, it means that there are allocated objects on it. 当LOH上存在碎片时,意味着它上面有分配的对象。 If you can affort the delay, you can once in a while wait till all currently running tasks are finished and call GC.Collect() . 如果您能够延迟延迟,您可以偶尔等待所有当前正在运行的任务完成并调用GC.Collect() When there are no referenced large objects, they will all be collected, effectively removing the fragmentation of the LOH. 当没有被引用的大对象时,它们都将被收集,有效地消除了LOH的碎片。 Of course this only works if (allmost) all large objects are unreferenced. 当然,只有当(所有)所有大型对象都未被引用时,这才有效。

Also, moving to a 64 bit OS might also help, since out of memory due to fragmentation is much less likely to be a problem on 64 bits systems, because the virtual space is almost unlimited. 此外,迁移到64位操作系统也可能有所帮助,因为由于碎片造成的内存不足会导致64位系统出现问题,因为虚拟空间几乎是无限的。

Perhaps you could create a pool of string objects which you can use whilst processing the work then return back once you've finished. 也许您可以创建一个字符串对象池,您可以在处理工作时使用它,然后在完成后返回。

Once a large object has been created in the LOH, it can't be removed (AFAIK), so if you can't avoid creating these objects then the best plan is to reuse them. 一旦在LOH中创建了一个大对象,就无法将其删除(AFAIK),因此如果您无法避免创建这些对象,那么最好的计划是重用它们。

If you can change the protocol at both ends, then reducing your 'Contents' string into a set of smaller ones (<80k each) should stop them from being stored in the LOH. 如果您可以在两端更改协议,那么将“内容”字符串缩减为一组较小的字符串(每个<80k)应该阻止它们存储在LOH中。

How about using String.Intern(...) to eliminate duplicates references. 如何使用String.Intern(...)来消除重复引用。 It has a performance penalty, but depending on your strings it might have an impact. 它有性能损失,但根据你的字符串,它可能会产生影响。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM