Python：垃圾收集器的行为

Question

I have a Django application that exhibits some strange garbage collection behavior. 我有一个Django应用程序，展示了一些奇怪的垃圾收集行为。 There is one view in particular that will just keep growing the VM size significantly every time it is called - up to a certain limit, at which point usage drops back again. 有一个视图特别是每次调用时都会显着增加VM大小 - 达到某个限制，此时使用率会再次下降。 The problem is that it's taking considerable time until that point is reached, and in fact the virtual machine running my app doesn't have enough memory for all FCGI processes to take as much memory as they then sometimes do. 问题是，在达到这一点之前需要相当长的时间，实际上运行我的应用程序的虚拟机没有足够的内存供所有FCGI进程占用尽可能多的内存。

I've spent the last two days investigating this and learning about Python garbage collection, and I think I do understand what is happening now - for the most part. 我花了最近两天的时间来研究这个并学习Python垃圾收集，我想我现在明白了现在发生了什么 - 大部分时间。 When using 使用时

gc.set_debug(gc.DEBUG_STATS)

Then for a single request, I see the following output: 然后对于单个请求，我看到以下输出：

>>> c = django.test.Client()
>>> c.get('/the/view/')
gc: collecting generation 0...
gc: objects in each generation: 724 5748 147341
gc: done.
gc: collecting generation 0...
gc: objects in each generation: 731 6460 147341
gc: done.
[...more of the same...]    
gc: collecting generation 1...
gc: objects in each generation: 718 8577 147341
gc: done.
gc: collecting generation 0...
gc: objects in each generation: 714 0 156614
gc: done.
[...more of the same...]
gc: collecting generation 0...
gc: objects in each generation: 715 5578 156612
gc: done.

So essentially, a huge amount of objects are allocated, but are initially moved to generation 1, and when gen 1 is sweeped during the same request, they are moved to generation 2. If I do a manual gc.collect(2) afterwards, they are removed. 基本上，分配了大量的对象，但是最初被移动到第1代，当gen 1在同一请求期间被清空时，它们被移动到第2代。如果我之后做了手动gc.collect（2），他们被删除了。 And, as I mentioned, there also removed when the next automatic gen 2 sweep happens, which, if I understand correctly, would in this case something like every 10 requests (at this point the app needs about a 150MB). 而且，正如我所提到的，当下一次自动第2代扫描发生时也会被删除，如果我理解正确的话，在这种情况下，就像每10个请求一样（此时应用需要大约150MB）。

Alright, so initially I thought that there might be some cyclic referencing going on within the processing of one request that prevents any of these objects from being collected within the handling of that request. 好吧，所以最初我认为在处理一个请求时可能会有一些循环引用，这会阻止在处理该请求时收集任何这些对象。 However, I've spent hours trying to find one using pympler.muppy and objgraph, both after and by debugging inside the request processing, and there don't seem to be any. 但是，我花了好几个小时尝试使用pympler.muppy和objgraph，在请求处理之后和调试之间找到一个，并且似乎没有。 Rather, it seems the 14.000 objects or so that are created during the request are all within a reference chain to some request-global object, ie once the request goes away, they can be freed. 相反，似乎在请求期间创建的14.000个对象都在一个请求全局对象的引用链中，即一旦请求消失，它们就可以被释放。

That has been my attempt at explaining it, anyway. 无论如何，这是我试图解释它。 However, if that's true and there are indeed no cycling dependencies, shouldn't the whole tree of objects be freed once whatever request object that causes them to be held goes away, without the garbage collector being involved, purely by virtue of the reference counts dropping to zero? 但是，如果这是真的并且确实没有循环依赖，那么一旦任何导致它们被保持的请求对象消失，整个对象树就不会被释放，而不涉及垃圾收集器，纯粹是由于引用计数降到零？

With that setup, here are my questions: 有了这个设置，这是我的问题：

Does the above even make sense, or do I have to look for the problem elsewhere? 以上是否有意义，或者我是否必须在其他地方寻找问题？ Is it just an unfortunate accident that significant data is kept around for so long in this particular use case? 在这个特定的用例中，重要数据长期存在只是一个不幸的事故吗？
Is there anything I can do to avoid the issue. 我能做些什么来避免这个问题。 I already see some potential to optimize the view, but that appears to be a solution with limited scope - although I am not sure what I generic one would be, either; 我已经看到了优化视图的一些潜力，但这似乎是一个范围有限的解决方案 - 尽管我不确定我的通用性是什么，或者; how advisable is it for example to call gc.collect() or gc.set_threshold() manually? 例如，手动调用gc.collect（）或gc.set_threshold（）是否明智？

In terms of how the garbage collector itself works: 就垃圾收集器本身的工作方式而言：

Do I understand correctly that an object is always moved to the next generation if a sweep looks at it and determines that the references it has are not cyclic , but can in fact be traced to a root object. 我是否正确理解对象总是移动到下一代，如果扫描查看它并确定它具有的引用不是循环的 ，但实际上可以跟踪到根对象。
What happens if the gc does a, say, generation 1 sweep, and finds an object that is referenced by an object within generation 2; 如果gc进行第1代扫描，并找到第2代中对象引用的对象，会发生什么？ does it follow that relationship inside generation 2, or does it wait for a generation 2 sweep to occur before analyzing the situation? 它是否遵循第2代内部的关系，还是在分析情况之前等待第2代扫描？
When using gc.DEBUG_STATS, I care primarily about the "objects in each generation" info; 使用gc.DEBUG_STATS时，我主要关心“每一代中的对象”信息; however, I keep getting hundreds of "gc: 0.0740s elapsed.", "gc: 1258233035.9370s elapsed." 但是，我不断收到数百个“gc：0.0740s过去了。”，“gc：1258233035.9370s已经过去了。” messages; 消息; they are totally inconvenient - it takes considerable time for them to be printed out, and they make the interesting things a lot harder to find. 它们非常不方便 - 打印出来需要相当长的时间，而且它们使得有趣的东西更难找到。 Is there a way to get rid of them? 有办法摆脱它们吗？
I don't suppose there is a way to do a gc.get_objects() by generation, ie only retrieve the objects from generation 2, for example? 我不认为有一种方法可以通过生成来执行gc.get_objects（），例如，只检索第2代中的对象？

Answer 1

Does the above even make sense, or do I have to look for the problem elsewhere? 以上是否有意义，或者我是否必须在其他地方寻找问题？ Is it just an unfortunate accident that significant data is kept around for so long in this particular use case? 在这个特定的用例中，重要数据长期存在只是一个不幸的事故吗？

Yes, it does make sense. 是的，它确实有意义。 And yes, there are other issues worth to consider. 是的，还有其他值得考虑的问题。 Django uses threading.local as base for DatabaseWrapper (and some contribs use it to make request object accessible from places where it's not passed explicitly). Django使用threading.local作为DatabaseWrapper基础（并且一些contrib使用它来使请求对象可以从未明确传递的地方访问）。 These global objects survive requests and can keep references to objects till some other view is handled in the thread. 这些全局对象在请求中存活，并且可以保持对对象的引用，直到在线程中处理其他视图。

Is there anything I can do to avoid the issue. 我能做些什么来避免这个问题。 I already see some potential to optimize the view, but that appears to be a solution with limited scope - although I am not sure what I generic one would be, either; 我已经看到了优化视图的一些潜力，但这似乎是一个范围有限的解决方案 - 尽管我不确定我的通用性是什么，或者; how advisable is it for example to call gc.collect() or gc.set_threshold() manually? 例如，手动调用gc.collect（）或gc.set_threshold（）是否明智？

General advice (probably you know it, but anyway): avoid circular references and globals (including threading.local ). 一般建议（可能你知道，但无论如何）：避免循环引用和全局（包括threading.local ）。 Try to break cycles and clear globals when django design makes hard to avoid them. 当django设计难以避免时，尝试打破循环并清除全局变量。 gc.get_referrers(obj) might help you to find places requiring attention. gc.get_referrers(obj)可能会帮助您找到需要注意的地方。 Another way it to disable garbage collector and call it manually after each request, when it's the best place to do (this will prevent objects from moving to the next generation). 另一种方法是禁用垃圾收集器并在每次请求后手动调用它，这是最好的地方（这将阻止对象移动到下一代）。

I don't suppose there is a way to do a gc.get_objects() by generation, ie only retrieve the objects from generation 2, for example? 我不认为有一种方法可以通过生成来执行gc.get_objects（），例如，只检索第2代中的对象？

Unfortunately this is not possible with gc interface. 不幸的是，使用gc接口是不可能的。 But there are several ways to go. 但有几种方法可以去。 You can consider the end of list returned by gc.get_objects() only, since objects in this list are sorted by generation. 您可以考虑仅由gc.get_objects()返回的列表结尾，因为此列表中的对象按生成排序。 You can compare the list with one returned from previous call by storing weak references to them (eg in WeakKeyDictionary ) between calls. 您可以通过在调用之间存储对它们的弱引用（例如，在WeakKeyDictionary ）来将列表与从先前调用返回的列表进行比较。 You can rewrite gc.get_objects() in your own C module (it's easy, mostly copy-paste programming!) since they are stored by generation internally, or even access internal structures with ctypes (requires quite deep ctypes understanding). 您可以在自己的C模块中重写gc.get_objects() （这很简单，主要是复制粘贴编程！）因为它们是由内部生成存储的，甚至是使用ctypes访问内部结构（需要非常深入的ctypes理解）。

Answer 2

I think your analysis looks sound. 我认为您的分析看起来很合理。 I'm not an expert on the gc , so whenever I have a problem like this I just add a call to gc.collect() in an appropriate, non time critical place, and forget about it. 我不是gc的专家，所以每当我遇到这样的问题时，我只需要在适当的非时间关键位置添加对gc.collect()的调用，然后忘记它。

I'd suggest you call gc.collect() in your view(s) and see what effect it has on your response time and your memory usage. 我建议你在你的视图中调用gc.collect() ，看看它对你的响应时间和你的内存使用有什么影响。

Note also this question which suggests that setting DEBUG=True eats memory like it is nearly past its sell by date. 还要注意这个问题，这个问题表明设置DEBUG=True使得内存几乎超过了它的销售日期。

Python：垃圾收集器的行为

问题描述

2 个解决方案

解决方案1
3 2009-11-16 09:55:18

解决方案2
2 2009-11-16 09:01:06

Python：垃圾收集器的行为

问题描述

2 个解决方案

解决方案1 3 2009-11-16 09:55:18

解决方案2 2 2009-11-16 09:01:06

解决方案1
3 2009-11-16 09:55:18

解决方案2
2 2009-11-16 09:01:06