简体   繁体   中英

Efficiently searching a list to find specific Instance, without using `equals()` or `hashCode()`

I'm implementing a simple serialisation mechanism. Serialising an object recursively trawls its fields and writes them out to a stream. To avoid endless loops, when it encounters an object to write it checks to see if it's seen it before, if so then it writes a marker instead. This relies on maintaining a searchable list of objects it's seen before. The list's indexOf() and contains() methods can't use Object.equals , they must use == instead, since an object graph might have two objects in it that are identical in terms of the data, but should not actually be the same object. If i use a simple Map<Object, Integer> with the following example graph, then something bad happens:

   root: ParentObject (class Parent)
      field1:  ChildObject1 (class Child)
         data: "Hello"
      field2:  ChildObject2 (class Child)
         data: "Hello"

When serialised the Map finds ChildObject1 when asked to see if ChildObject2 has been written before, because the .equals() method returns true. When deserialised, the object tree now looks like this:

   root: ParentObject (class Parent)
      field1:  ChildObject1 (class Child)
         data: "Hello"
      field2:  <reference to ChildObject1> 

Now the problem is that if something modifies ChildObject1 then the apparent ChildObject2 also has that change, which is different behaviour to what would have happened before serialisation. If these objects were immutable then this would not be a problem, but this mechanism is meant to be general purpose and can't ensure immutability, and in the specific case i actually need it for objects are not immutable either.

In a lower level language i would simply create a lookup based on pointer address, but that's not an option here.

I can use a simple List<Object> , and do a linear search on list.get(i) == needle , but this is very inefficient. My first thought is a simple binary search, but what do i search on? There's no identifying information, no key, to use. This seems to preclude the use of any more efficient lookup structure.

I have used Unsafe in the past to output identity information (basically the object's pointer) for debug logging purposes, but this seems, well, "unsafe"! In the back of my mind i have this idea that the JVM might be free to move things around, for example after a GC, which would break this approach, too.

How can i work around this problem?

A linear scan of a list will be O(N) where N is the list length. That's not efficient, and you can't make it efficient.

You could use System.identityHashcode(Object) to calculate a hashcode that will be compatible with == .

But there is a simpler solution. There is a Map class called IdentityHashMap which is pretty much designed for your use-case. This Map implementation has O(1) lookup and insertion (amortized)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM