简体   繁体   English

为什么Java toString()在间接循环中无限循环?

[英]Why does Java toString() loop infinitely on indirect cycles?

This is more a gotcha I wanted to share than a question: when printing with toString() , Java will detect direct cycles in a Collection (where the Collection refers to itself), but not indirect cycles (where a Collection refers to another Collection which refers to the first one - or with more steps). 这更像是一个我想分享的问题而不是一个问题:当使用toString()打印时,Java将检测Collection中的直接循环(Collection指向自身),但不是间接循环(其中Collection指的是另一个Collection,是指第一个 - 或更多步骤)。

import java.util.*;
public class ShonkyCycle {
  static public void main(String[] args) {
    List a = new LinkedList();
    a.add(a);                      // direct cycle
    System.out.println(a);         // works:  [(this Collection)]

    List b = new LinkedList();
    a.add(b);
    b.add(a);                      // indirect cycle
    System.out.println(a);         // shonky:  causes infinite loop!
  }
}

This was a real gotcha for me, because it occurred in debugging code to print out the Collection (I was surprised when it caught a direct cycle, so I assumed incorrectly that they had implemented the check in general). 这对我来说是一个真正的问题,因为它发生在调试代码中以打印出Collection(当它遇到直接循环时我感到很惊讶,所以我认为他们已经错误地实现了一般的检查)。 There is a question: why? 有一个问题:为什么?

The explanation I can think of is that it is very inexpensive to check for a collection that refers to itself, as you only need to store the collection (which you have already), but for longer cycles, you need to store all the collections you encounter, starting from the root. 我能想到的解释是,检查一个引用自身的集合是非常便宜的,因为你只需要存储集合(你已经存在),但是对于更长的周期,你需要存储所有的集合你遭遇,从根开始。 Additionally, you might not be able to tell for sure what the root is, and so you'd have to store every collection in the system - which you do anyway - but you'd also have to do a hash lookup on every collection element. 此外,您可能无法肯定地告诉根源是什么,所以你必须存储系统中的每个集合-你无论如何做-但你也不得不做的每集合元素的哈希查找。 It's very expensive for the relatively rare case of cycles (in most programming). 对于相对罕见的周期(在大多数编程中),这是非常昂贵的。 (I think) the only reason it checks for direct cycles is because it so cheap (one reference comparison). (我认为)它检查直接循环的唯一原因是因为它如此便宜(一个参考比较)。

OK... I've kinda answered my own question - but have I missed anything important? 好的...我有点回答了我自己的问题 - 但是我错过了什么重要的事吗? Anyone want to add anything? 有人想要添加任何东西吗?


Clarification: I now realize the problem I saw is specific to printing a Collection (ie the toString() method). 澄清:我现在意识到我看到的问题特定于打印 Collection(即toString()方法)。 There's no problem with cycles per se (I use them myself and need to have them); 循环本身没有问题(我自己使用它们并需要它们); the problem is that Java can't print them. 问题是Java无法打印它们。 Edit Andrzej Doyle points out it's not just collections, but any object whose toString is called. 编辑 Andrzej Doyle指出它不仅仅是集合,而是任何调用toString对象。

Given that it's constrained to this method, here's an algorithm to check for it: 鉴于它受此方法的限制,这里有一个检查它的算法:

  • the root is the object that the first toString() is invoked on (to determine this, you need to maintain state on whether a toString is currently in progress or not; so this is inconvenient). root是调用第一个toString()的对象(为了确定这一点,你需要保持关于toString当前是否正在进行的状态;所以这很不方便)。
    • as you traverse each object, you add it to an IdentityHashMap, along with a unique identifier (eg an incremented index). 在遍历每个对象时,将其添加到IdentityHashMap,以及唯一标识符(例如递增的索引)。
    • but if this object is already in the Map, write out its identifier instead. 但如果此对象已在Map中,请写出其标识符。

This approach also correctly renders multirefs (a node that is referred to more than once). 此方法还可以正确呈现multirefs(一个多次引用的节点)。

The memory cost is the IdentityHashMap (one reference and index per object); 内存成本是IdentityHashMap(每个对象一个引用和索引); the complexity cost is a hash lookup for every node in the directed graph (ie each object that is printed). 复杂性成本是有向图中每个节点(即每个打印的对象)的哈希查找。

I think fundamentally it's because while the language tries to stop you from shooting yourself in the foot, it shouldn't really do so in a way that's expensive. 我认为从根本上说这是因为虽然语言试图阻止你在脚下射击自己,但它不应该以一种昂贵的方式这样做。 So while it's almost free to compare object pointers (eg does obj == this ) anything beyond that involves invoking methods on the object you're passing in. 因此,虽然几乎可以自由地比较对象指针(例如obj == this ),但除此之外的任何事情都涉及在您传入的对象上调用方法。

And at this point the library code doesn't know anything about the objects you're passing in. For one, the generics implementation doesn't know if they're instances of Collection (or Iterable ) themselves, and while it could find this out via instanceof , who's to say whether it's a "collection-like" object that isn't actually a collection, but still contains a deferred circular reference? 在这一点上,库代码对你传入的对象一无所知。例如,泛型实现不知道它们是否是Collection (或Iterable )本身的实例,而它可以找到这个通过instanceof ,谁来说是否是一个“类似集合”的对象实际上不是一个集合,但仍然包含一个延迟的循环引用? Secondly, even if it is a collection there's no telling what it's actual implementation and thus behaviour is like. 其次,即使它是一个集合,也不知道它的实际实现是什么,因此行为就像。 Theoretically one could have a collection containing all the Longs which is going to be used lazily; 理论上,人们可以拥有一个包含所有Longs的集合,这些集合将被懒惰地使用; but since the library doesn't know this it would be hideously expensive to iterate over every entry. 但由于图书馆不知道这一点,因此迭代每个条目会非常昂贵。 Or in fact one could even design a collection with an Iterator that never terminated (though this would be difficult to use in practice because so many constructs/library classes assume that hasNext will eventually return false ). 或者实际上甚至可以设计一个永远不会终止的迭代器集合(虽然这在实践中很难使用,因为很多构造/库类假设hasNext最终会返回false )。

So it basically comes down to an unknown, possibly infinite cost in order to stop you from doing something that might not actually be an issue anyway. 所以它基本上归结为一个未知的,可能是无限的成本,以阻止你做一些可能实际上不是问题的事情。

I'd just like to point out that this statement: 我只想指出这句话:

when printing with toString(), Java will detect direct cycles in a collection 当使用toString()进行打印时, Java将检测集合中的直接循环

is misleading. 是误导。

Java (the JVM, the language itself, etc) is not detecting the self-reference. Java (JVM,语言本身等)没有检测到自引用。 Rather this is a property of the toString() method/override of java.util.AbstractCollection . 相反,这是java.util.AbstractCollectiontoString()方法/覆盖的属性。

If you were to create your own Collection implementation, the language/platform wouldn't automatically safe you from a self-reference like this - unless you extend AbstractCollection , you would have to make sure you cover this logic yourself. 如果您要创建自己的Collection实现,语言/平台不会自动保护您免受这样的自引用 - 除非您扩展AbstractCollection ,否则您必须确保自己覆盖此逻辑。

I might be splitting hairs here but I think this is an important distinction to make. 我可能会在这里分裂,但我认为这是一个重要的区别。 Just because one of the foundation classes in the JDK does something doesn't mean that "Java" as an overall umbrella does it. 仅仅因为JDK中的一个基础类做了某些事情并不意味着“Java”作为整体保护伞就能做到。

Here is the relevant source code in AbstractCollection.toString() , with the key line commented: 以下是AbstractCollection.toString()的相关源代码,其中注释了关键字:

public String toString() {
    Iterator<E> i = iterator();
    if (! i.hasNext())
        return "[]";

    StringBuilder sb = new StringBuilder();
    sb.append('[');
    for (;;) {
        E e = i.next();
        // self-reference check:
        sb.append(e == this ? "(this Collection)" : e); 
        if (! i.hasNext())
            return sb.append(']').toString();
        sb.append(", ");
    }
}

The problem with the algorithm that you propose is that you need to pass the IdentityHashMap to all Collections involved. 您建议的算法的问题是您需要将IdentityHashMap传递给所涉及的所有集合。 This is not possible using the published Collection APIs. 使用已发布的Collection API无法做到这一点。 The Collection interface does not define a toString(IdentityHashMap) method. Collection接口未定义toString(IdentityHashMap)方法。

I imagine that whoever at Sun put the self reference check into the AbstractCollection.toString() method thought of all of this, and (in conjunction with his colleagues) decided that a "total solution" is over the top. 我想,无论是谁,Sun都会将自引用检查放入AbstractCollection.toString()方法中,并且(与他的同事一起)认为“整体解决方案”是最重要的。 I think that the current design / implementation is correct. 我认为目前的设计/实施是正确的。

It is not a requirement that Object.toString implementations be bomb-proof. 并不要求Object.toString实现是防弹的。

You are right, you already answered your own question. 你是对的,你已经回答了自己的问题。 Checking for longer cycles (especially really long ones like period length 1000) would be too much overhead and is not needed in most cases. 检查更长的周期(特别是长周期,例如周期长度1000)将是过多的开销,并且在大多数情况下不需要。 If someone wants it, he has to check it himself. 如果有人想要,他必须亲自检查。

The direct cycle case, however, is easy to check and will occur more often, so it's done by Java. 然而,直接循环的情况很容易检查并且会更频繁地发生,所以它是由Java完成的。

You can't really detect indirect cycles; 你无法真正发现间接周期; it's a typical example of the halting problem. 这是暂停问题的典型例子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM