简体   繁体   English

关于闭包、词法环境和 GC

[英]About closure, LexicalEnvironment and GC

as ECMAScriptv5, each time when control enters a code, the enginge creates a LexicalEnvironment (LE) and a VariableEnvironment (VE), for function code , these 2 objects are exactly the same reference which is the result of calling NewDeclarativeEnvironment( ECMAScript v5 10.4.3 ), and all variables declared in function code are stored in the environment record componentof VariableEnvironment ( ECMAScript v5 10.5 ), and this is the basic concept for closure .作为ECMAScriptv5,每次control进入代码时,引擎都会创建一个LexicalEnvironment (LE)和一个VariableEnvironment (VE),对于函数代码,这两个对象是完全相同的引用,是调用NewDeclarativeEnvironment( ECMAScript v5 10.4. 3 ),函数代码中声明的所有变量都存放在VariableEnvironment ( ECMAScript v5 10.5 )的环境记录组件中,这就是闭包的基本概念

What confused me is how Garbage Collect works with this closure approach, suppose I have code like:让我感到困惑的是垃圾收集如何使用这种闭包方法,假设我有如下代码:

function f1() {
    var o = LargeObject.fromSize('10MB');
    return function() {
        // here never uses o
        return 'Hello world';
    }
}
var f2 = f1();

after the line var f2 = f1() , our object graph would be:var f2 = f1()行之后,我们的对象图将是:

global -> f2 -> f2's VariableEnvironment -> f1's VariableEnvironment -> o

so as from my little knowledge, if the javascript engine uses a reference counting method for garbage collection, the object o has at lease 1 refenrence and would never be GCed.因此,据我所知,如果 javascript 引擎使用引用计数方法进行垃圾收集,则对象o至少有1引用并且永远不会被 GC。 Appearently this would result a waste of memory since o would never be used but is always stored in memory.显然这会导致内存浪费,因为o永远不会被使用,而是总是存储在内存中。

Someone may said the engine knows that f2's VariableEnvironment doesn't use f1's VariableEnvironment , so the entire f1's VariableEnvironment would be GCed, so there is another code snippet which may lead to more complex situation:可能有人说引擎知道f2 的 VariableEnvironment没有使用f1 的 VariableEnvironment ,所以整个f1 的 VariableEnvironment都会被GCed,所以还有一段代码可能会导致更复杂的情况:

function f1() {
    var o1 = LargeObject.fromSize('10MB');
    var o2 = LargeObject.fromSize('10MB');
    return function() {
        alert(o1);
    }
}
var f2 = f1();

in this case, f2 uses the o1 object which stores in f1's VariableEnvironment , so f2's VariableEnvironment must keep a reference to f1's VariableEnvironment , which result that o2 cannot be GCed as well, which further result in a waste of memory.在这种情况下, f2使用存储在f1 的 VariableEnvironment 中o1对象,因此f2 的 VariableEnvironment必须保留对f1 的 VariableEnvironment的引用,这导致o2也不能被 GC ,从而进一步导致内存浪费。

so I would ask, how modern javascript engine (JScript.dll / V8 / SpiderMonkey ...) handles such situation, is there a standard specified rule or is it implementation based, and what is the exact step javascript engine handles such object graph when executing Garbage Collection.所以我会问,现代 javascript 引擎(JScript.dll / V8 / SpiderMonkey ...)如何处理这种情况,是否有标准的指定规则或它是否基于实现,以及 javascript 引擎处理此类对象图的确切步骤是什么执行垃圾收集。

Thanks.谢谢。

tl;dr answer: "Only variables referenced from inner fns are heap allocated in V8. If you use eval then all vars assumed referenced." tl;dr 回答: “只有从内部 fns 引用的变量才会在 V8 中分配堆。如果您使用 eval,则假定所有变量都被引用。” . . In your second example, o2 can be allocated on the stack and is thrown away after f1 exits.在您的第二个示例中, o2可以在堆栈上分配并在f1退出后被丢弃。


I don't think they can handle it.我不认为他们可以处理它。 At least we know that some engines cannot, as this is known to be the cause of many memory leaks, as for example:至少我们知道有些引擎不能,因为众所周知这是导致许多内存泄漏的原因,例如:

function outer(node) {
    node.onclick = function inner() { 
        // some code not referencing "node"
    };
}

where inner closes over node , forming a circular reference inner -> outer's VariableContext -> node -> inner , which will never be freed in for instance IE6, even if the DOM node is removed from the document.其中inner关闭node ,形成循环引用inner -> outer's VariableContext -> node -> inner ,即使从文档中删除 DOM 节点,也永远不会在例如 IE6 中被释放。 Some browsers handle this just fine though: circular references themselves are not a problem, it's the GC implementation in IE6 that is the problem.不过,有些浏览器可以很好地处理这个问题:循环引用本身不是问题,而是 IE6 中的 GC 实现才是问题所在。 But now I digress from the subject.但现在我离题了。

A common way to break the circular reference is to null out all unnecessary variables at the end of outer .打破循环引用的一般方法是在端部设置为空出所有不必要的变量outer Ie, set node = null .即,设置node = null The question is then whether modern javascript engines can do this for you, can they somehow infer that a variable is not used within inner ?那么问题是现代 javascript 引擎是否可以为您做到这一点,他们能否以某种方式推断出在inner没有使用变量?

I think the answer is no, but I can be proven wrong.我认为答案是否定的,但我可以被证明是错误的。 The reason is that the following code executes just fine:原因是以下代码执行得很好:

function get_inner_function() {
    var x = "very big object";
    var y = "another big object";
    return function inner(varName) {
        alert(eval(varName));
    };
}

func = get_inner_function();

func("x");
func("y");

See for yourself using this jsfiddle example .使用此 jsfiddle 示例亲自查看。 There are no references to either x or y inside inner , but they are still accessible using eval .有要么没有引用xyinner ,但他们仍然可以访问使用eval (Amazingly, if you alias eval to something else, say myeval , and call myeval , you DO NOT get a new execution context - this is even in the specification, see sections 10.4.2 and 15.1.2.1.1 in ECMA-262.) (令人惊讶的是,如果您将eval别名为其他内容,例如myeval并调用myeval ,您不会获得新的执行上下文 - 这甚至在规范中,请参阅 ECMA-262 中的第 10.4.2 和 15.1.2.1.1 节。 )


Edit: As per your comment, it appears that some modern engines actually do some smart tricks, so I tried to dig a little more.编辑:根据你的评论,似乎一些现代引擎实际上做了一些聪明的技巧,所以我试着多挖一点。 I came across this forum thread discussing the issue, and in particular, a link to a tweet about how variables are allocated in V8 .我遇到了这个讨论这个问题的论坛帖子,特别是一个关于如何在 V8 中分配变量的推文的链接。 It also specifically touches on the eval problem.它还特别涉及eval问题。 It seems that it has to parse the code in all inner functions.似乎它必须解析所有内部函数中的代码。 and see what variables are referenced, or if eval is used, and then determine whether each variable should be allocated on the heap or on the stack.并查看引用了哪些变量,或者是否使用了eval ,然后确定每个变量应该分配在堆上还是堆栈上。 Pretty neat.漂亮整齐。 Here is another blog that contains a lot of details on the ECMAScript implementation.这是另一个博客,其中包含有关 ECMAScript 实现的大量详细信息。

This has the implication that even if an inner function never "escapes" the call, it can still force variables to be allocated on the heap.这意味着即使内部函数从不“逃避”调用,它仍然可以强制在堆上分配变量。 Eg:例如:

function init(node) {

    var someLargeVariable = "...";

    function drawSomeWidget(x, y) {
        library.draw(x, y, someLargeVariable);
    }

    drawSomeWidget(1, 1);
    drawSomeWidget(101, 1);

    return function () {
        alert("hi!");
    };
}

Now, as init has finished its call, someLargeVariable is no longer referenced and should be eligible for deletion, but I suspect that it is not, unless the inner function drawSomeWidget has been optimized away (inlined?).现在,由于init已经完成了它的调用, someLargeVariable不再被引用并且应该有资格被删除,但我怀疑它不是,除非内部函数drawSomeWidget已经被优化掉(内联?)。 If so, this could probably occur pretty frequently when using self-executing functions to mimick classes with private / public methods.如果是这样,当使用自执行函数来模仿具有私有/公共方法的类时,这可能会经常发生。


Answer to Raynos comment below.回答下面的雷诺斯评论。 I tried the above scenario (slightly modified) in the debugger, and the results are as I predict, at least in Chrome:我在调试器中尝试了上述场景(稍作修改),结果如我所料,至少在 Chrome 中是这样:

Chrome 调试器的屏幕截图 When the inner function is being executed, someLargeVariable is still in scope.当内部函数正在执行时, someLargeVariable 仍在范围内。

If I comment out the reference to someLargeVariable in the inner drawSomeWidget method, then you get a different result:如果我在内部drawSomeWidget方法中注释掉对someLargeVariable的引用,那么您会得到不同的结果:

Chrome 调试器 2 的屏幕截图 Now someLargeVariable is not in scope, because it could be allocated on the stack.现在someLargeVariable不在范围内,因为它可以在堆栈上分配。

There is no standard specifications of implementation for GC, every engine have their own implementation. GC 没有标准的实现规范,每个引擎都有自己的实现。 I know a little concept of v8, it has a very impressive garbage collector (stop-the-world, generational, accurate).我知道一些 v8 的概念,它有一个非常令人印象深刻的垃圾收集器(停止世界,分代,准确)。 As above example 2, the v8 engine has following step:如上例 2,v8 引擎有以下步骤:

  1. create f1's VariableEnvironment object called f1.创建名为 f1 的 f1 的 VariableEnvironment 对象。
  2. after created that object the V8 creates an initial hidden class of f1 called H1.创建该对象后,V8 创建了一个名为 H1 的初始隐藏类 f1。
  3. indicate the point of f1 is to f2 in root level.表示 f1 的点是在根级别的 f2。
  4. create another hidden class H2, based on H1, then add information to H2 that describes the object as having one property, o1, store it at offset 0 in the f1 object.创建另一个隐藏类 H2,基于 H1,然后向 H2 添加信息,将对象描述为具有一个属性 o1,将其存储在 f1 对象中的偏移量 0 处。
  5. updates f1 point to H2 indicated f1 should used H2 instead of H1.更新 f1 指向 H2 指示 f1 应该使用 H2 而不是 H1。
  6. creates another hidden class H3, based on H2, and add property, o2, store it at offset 1 in the f1 object.创建另一个隐藏类 H3,基于 H2,并添加属性 o2,将其存储在 f1 对象中的偏移量 1 处。
  7. updates f1 point to H3.更新 f1 指向 H3。
  8. create anonymous VariableEnvironment object called a1.创建名为 a1 的匿名 VariableEnvironment 对象。
  9. create an initial hidden class of a1 called A1.创建一个名为 A1 的初始隐藏类 a1。
  10. indicate a1 parent is f1.表明 a1 父是 f1。

On parse function literal, it create FunctionBody.在解析函数文字时,它会创建 FunctionBody。 Only parse FunctionBody when function was called.The following code indicate it not throw error while parser time只在函数被调用时解析FunctionBody。下面的代码表明它在解析时不会抛出错误

function p(){
  return function(){alert(a)}
}
p();

So at GC time H1, H2 will be swept, because no reference point that.In my mind if the code is lazily compiled, no way to indicate o1 variable declared in a1 is a reference to f1, It use JIT.所以在GC的时候H1,H2会被清扫,因为没有那个引用点。在我看来,如果代码是懒惰编译的,没有办法表明a1中声明的o1变量是对f1的引用,它使用JIT。

if the javascript engine uses a reference counting method如果 javascript 引擎使用引用计数方法

Most javascript engine's use some variant of a compacting mark and sweep garbage collector, not a simple reference counting GC, so reference cycles do not cause problems.大多数javascript引擎使用压缩标记和清除垃圾收集器的某种变体,而不是简单的引用计数GC,因此引用循环不会引起问题。

They also tend to do some tricks so that cycles that involve DOM nodes (which are reference counted by the browser outside the JavaScript heap) don't introduce uncollectible cycles.他们还倾向于做一些技巧,以便涉及 DOM 节点(由浏览器在 JavaScript 堆外进行引用计数)的循环不会引入不可收集的循环。 The XPCOM cycle collector does this for Firefox. XPCOM 循环收集器为 Firefox 执行此操作。

The cycle collector spends most of its time accumulating (and forgetting about) pointers to XPCOM objects that might be involved in garbage cycles.循环收集器大部分时间都在积累(并忘记)指向可能涉及垃圾循环的 XPCOM 对象的指针。 This is the idle stage of the collector's operation, in which special variants of nsAutoRefCnt register and unregister themselves very rapidly with the collector, as they pass through a "suspicious" refcount event (from N+1 to N, for nonzero N).这是收集器操作的空闲阶段,其中nsAutoRefCnt特殊变体在通过“可疑”引用计数事件(从 N+1 到 N,对于非零 N)时非常快速地向收集器注册和取消注册自己。

Periodically the collector wakes up and examines any suspicious pointers that have been sitting in its buffer for a while.收集器会定期唤醒并检查任何已在其缓冲区中放置一段时间的可疑指针。 This is the scanning stage of the collector's operation.这是收集器操作的扫描阶段。 In this stage the collector repeatedly asks each candidate for a singleton cycle-collection helper class, and if that helper exists, the collector asks the helper to describe the candidate's (owned) children.在这个阶段,收集器反复向每个候选者询问一个单例循环收集帮助器类,如果该帮助器存在,收集器会要求帮助器描述候选者的(拥有的)孩子。 This way the collector builds a picture of the ownership subgraph reachable from suspicious objects.通过这种方式,收集器构建了可从可疑对象访问的所有权子图的图片。

If the collector finds a group of objects that all refer back to one another, and establishes that the objects' reference counts are all accounted for by internal pointers within the group, it considers that group cyclical garbage, which it then attempts to free.如果收集器找到一组都相互引用的对象,并确定对象的引用计数都由组内的内部指针计算,则它认为该组是循环垃圾,然后尝试将其释放。 This is the unlinking stage of the collectors operation.这是收集器操作的取消链接阶段。 In this stage the collector walks through the garbage objects it has found, again consulting with their helper objects, asking the helper objects to "unlink" each object from its immediate children.在这个阶段,收集器遍历它找到的垃圾对象,再次咨询它们的助手对象,要求助手对象将每个对象与其直接子对象“断开链接”。

Note that the collector also knows how to walk through the JS heap, and can locate ownership cycles that pass in and out of it.请注意,收集器还知道如何遍历 JS 堆,并且可以定位传入和传出它的所有权循环。

EcmaScript harmony is likely to include ephemerons as well to provide weakly held references. EcmaScript 和谐很可能也包含ephemerons以提供弱引用。

You might find "The future of XPCOM memory management" interesting.您可能会发现“XPCOM 内存管理的未来”很有趣。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM