简体   繁体   English

垃圾收集器的行为-“标记对象”如何工作?

[英]Garbage Collector behaviour - How does “marking the objects” work?

I've found an interesting question about garbage collector. 我发现了一个有关垃圾收集器的有趣问题。
For the following code: 对于以下代码:

class Test {
      Short x = 200;
}

public class MyTest {
    public static void main(String[] args) {
         Test a1 = new Test();
         Test a2 = new Test();
         a1 = null;
         // here
    }
}

How many objects will be marked as ready to destroy for GC when the program reaches // here? 当程序到达//这里时,有多少个对象将标记为准备销毁用于GC?

The correct answer is 2 but for similar code: 正确答案是2,但是对于类似的代码:

class Test {
      Short x = 5;
}

public class MyTest {
    public static void main(String[] args) {
         Test a1 = new Test();
         Test a2 = new Test();
         a1 = null;
         // here
    }
}

the correct answer is 1. 正确答案是1。

JVM small values caching comes to my mind but i'm not sure. 我想到JVM小值缓存。但是我不确定。
Can anyone explain this behaviour of GC? 谁能解释GC的这种行为?

The GC will typically use all your global variables and the variables on all your threads as roots, mark them as alive (reachable) and then recursively follow the references they contain and mark those referenced objects as alive, and so on. GC通常会将所有全局变量和所有线程上的变量用作根,将它们标记为活动(可访问),然后递归地遵循它们包含的引用并将这些引用的对象标记为活动,依此类推。 The objects that weren't marked as alive will be assumed dead (unreachable) and will be collected. 未标记为活动的对象将被视为已死亡(无法访问)并被收集。

For boxed types, which are classes and, therefore, their instances are subject to GC cycles, a JVM implementation is required to cache at least a particular range of numerical values around zero: 对于盒装类型,它们是类,因此它们的实例受GC周期影响, 因此需要JVM实现来缓存至少一个特定范围的零附近的数值:

If the value p being boxed is true, false, a byte, or a char in the range \ to \, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. 如果装箱的值p为 true,false,字节或\\ u0000到\\ u007f范围内的char 或-128到127(含)之间的整数或短数,则令r1和r2为p的任何两次拳击转换。 It is always the case that r1 == r2. r1 == r2总是这样。

Therefore, these cached instances, even if they aren't reachable from GC roots in the user code, are always marked as alive by the JVM itself and thus they won't be collected. 因此,即使从用户代码中的GC根目录无法访问这些缓存的实例,它们也始终被JVM本身标记为活动状态,因此不会被收集。

Needless to say, you should't rely on this for equality checks, as the exact range of values that the JVM caches is determined by each particular implementation: 不用说,您不应该依赖于此进行相等性检查,因为JVM缓存的确切值范围由每个特定实现确定:

Ideally, boxing a given primitive value p, would always yield an identical reference. 理想情况下,将给定的原始值p装箱将始终产生相同的参考。 In practice, this may not be feasible using existing implementation techniques. 实际上,使用现有的实现技术可能不可行。 The rules above are a pragmatic compromise. 以上规则是一种务实的妥协。 The final clause above requires that certain common values always be boxed into indistinguishable objects. 上面的最后一个子句要求始终将某些通用值装在无法区分的对象中。 The implementation may cache these, lazily or eagerly. 该实现可以懒惰地或急切地缓存它们。 For other values, this formulation disallows any assumptions about the identity of the boxed values on the programmer's part. 对于其他值,此公式不允许对程序员方面的带框值的身份进行任何假设。 This would allow (but not require) sharing of some or all of these references. 这将允许(但不要求)共享部分或全部这些引用。

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. 这样可以确保在大多数情况下,行为都是理想的,而不会造成不必要的性能损失,尤其是在小型设备上。 Less memory-limited implementations might, for example, cache all char and short values, as well as int and long values in the range of -32K to +32K. 例如,较少内存限制的实现可能会缓存所有char和short值,以及-32K到+ 32K范围内的int和long值。

For the first example: 对于第一个示例:

The correct answer is 2 ... 正确答案是2 ...

Actually, I think that depending on various factors, any number between 0 and 4 could be correct. 其实,我认为,这取决于多种因素,0-4之间的任意数字可能是正确的。

  • First of all, it is not clear how many objects have been created in the first place. 首先,不清楚首先创建了多少个对象。 There are clearly two Test objects created. 显然,创建了两个Test对象。 But the number of Short objects that were created could be 0, 1 or 2. The specs say that Short may keep a cache of values, but for 200, it is not required to. 但是创建的Short对象的数量可以是0、1或2。规范说Short 可以保留值的缓存,但是对于200,则不需要 If it doesn't cache them, 2 Short(2) objects may be created. 如果不缓存它们,则可能会创建2个Short(2)对象。 If it does, then either 0 or 1 will be created. 如果是这样,则将创建0或1。 (None will be created if some other code has already cached a Short(200) ). (如果其他一些代码已经缓存了Short(200)则不会创建任何内容)。

  • Next there is the issue of what variables are really "live" at the indicated point. 接下来是在指示的点上哪些变量真正“有效”的问题。 Clearly, a1 and a2 are still in scope. 显然, a1a2仍在范围内。 However, the GC would be allow to treat both a1 and a2 as "dead" because they cannot influence the observable behavior method at that point. 但是,GC将被允许将a1a2视为“死亡”,因为它们此时无法影响可观察到的行为方法。 Hence we cannot say whether the second Test instance would be treated as reachable by the GC. 因此,我们不能说第二个Test实例是否将被GC视为可达。

  • Finally, since the assignment of null to a1 does not influence the observable behavior of the method, it is (arguably) legitimate for the optimizer to optimize that assignment away. 最后,由于将null分配给a1不会影响该方法的可观察行为,因此,优化器将其优化优化是合理的(可以说)。 Hence, you might get the situation where a1 still contains reference to a Test instance that is visible to the GC. 因此,您可能会遇到a1仍然包含对GC可见的Test实例的引用的情况。

In the second case, the JLS guarantees that Short(5) will be cached (by autoboxing), so we can be sure that the code will create at most 1 Short(5) instance. 在第二种情况下,JLS保证将缓存Short(5) (通过自动装箱),因此我们可以确定代码最多将创建1个Short(5)实例。 However, the other sources of uncertainty still apply. 但是,其他不确定性来源仍然适用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM