简体   繁体   English

有效搜索列表以查找特定实例,而无需使用equals()或hashCode()

[英]Efficiently searching a list to find specific Instance, without using `equals()` or `hashCode()`

I'm implementing a simple serialisation mechanism. 我正在实现一个简单的序列化机制。 Serialising an object recursively trawls its fields and writes them out to a stream. 序列化对象以递归方式拖曳其字段并将其写出到流中。 To avoid endless loops, when it encounters an object to write it checks to see if it's seen it before, if so then it writes a marker instead. 为了避免无限循环,当遇到对象写入时,它会检查是否以前看过它,如果是,那么它会写一个标记。 This relies on maintaining a searchable list of objects it's seen before. 这依赖于维护以前见过的对象的可搜索列表。 The list's indexOf() and contains() methods can't use Object.equals , they must use == instead, since an object graph might have two objects in it that are identical in terms of the data, but should not actually be the same object. 列表的indexOf()contains()方法不能使用Object.equals ,而必须使用== ,因为对象图中可能有两个对象,它们在数据方面是相同的,但实际上不应该是同一对象。 If i use a simple Map<Object, Integer> with the following example graph, then something bad happens: 如果我将简单的Map<Object, Integer>与下面的示例图一起使用,则会发生一些不良情况:

   root: ParentObject (class Parent)
      field1:  ChildObject1 (class Child)
         data: "Hello"
      field2:  ChildObject2 (class Child)
         data: "Hello"

When serialised the Map finds ChildObject1 when asked to see if ChildObject2 has been written before, because the .equals() method returns true. 当被序列化时,当.equals()方法返回true时, Map被问到是否ChildObject2编写过ChildObject1时会找到ChildObject1 When deserialised, the object tree now looks like this: 反序列化后,对象树现在如下所示:

   root: ParentObject (class Parent)
      field1:  ChildObject1 (class Child)
         data: "Hello"
      field2:  <reference to ChildObject1> 

Now the problem is that if something modifies ChildObject1 then the apparent ChildObject2 also has that change, which is different behaviour to what would have happened before serialisation. 现在的问题是,如果某些东西修改了ChildObject1那么表面上的ChildObject2也会发生这种变化,这与序列化之前的行为是不同的。 If these objects were immutable then this would not be a problem, but this mechanism is meant to be general purpose and can't ensure immutability, and in the specific case i actually need it for objects are not immutable either. 如果这些对象是不可变的,那么这将不是问题,但是这种机制是通用的,不能确保不可变性,在特定情况下,对于对象也不是不可变的,我实际上需要它。

In a lower level language i would simply create a lookup based on pointer address, but that's not an option here. 在较低级别的语言中,我将仅基于指针地址创建查找,但这不是这里的选择。

I can use a simple List<Object> , and do a linear search on list.get(i) == needle , but this is very inefficient. 我可以使用简单的List<Object> ,并对list.get(i) == needle进行线性搜索,但这效率很低。 My first thought is a simple binary search, but what do i search on? 我首先想到的是一个简单的二进制搜索,但是我该搜索什么呢? There's no identifying information, no key, to use. 没有要使用的识别信息,也没有密钥。 This seems to preclude the use of any more efficient lookup structure. 这似乎排除了使用任何更有效的查找结构的可能性。

I have used Unsafe in the past to output identity information (basically the object's pointer) for debug logging purposes, but this seems, well, "unsafe"! 过去,我曾使用Unsafe来输出身份信息(基本上是对象的指针)以进行调试日志记录,但这似乎是“ unsafe”! In the back of my mind i have this idea that the JVM might be free to move things around, for example after a GC, which would break this approach, too. 在我的脑海中,我想到JVM可以自由移动,例如在GC之后,这也将打破这种方法。

How can i work around this problem? 我该如何解决这个问题?

A linear scan of a list will be O(N) where N is the list length. 列表的线性扫描将是O(N) ,其中N是列表长度。 That's not efficient, and you can't make it efficient. 那不是有效的,您也无法使其有效。

You could use System.identityHashcode(Object) to calculate a hashcode that will be compatible with == . 您可以使用System.identityHashcode(Object)计算将与==兼容的哈希码。

But there is a simpler solution. 但是,有一个更简单的解决方案。 There is a Map class called IdentityHashMap which is pretty much designed for your use-case. 有一个名为IdentityHashMapMap类,该类非常适合您的用例。 This Map implementation has O(1) lookup and insertion (amortized) Map实现具有O(1)查找和插入(摊销)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM