简体   繁体   中英

HashSet.remove() and Iterator.remove() not working

I'm having problems with Iterator.remove() called on a HashSet.

I've a Set of time stamped objects. Before adding a new item to the Set, I loop through the set, identify an old version of that data object and remove it (before adding the new object). the timestamp is included in hashCode and equals(), but not equalsData().

for (Iterator<DataResult> i = allResults.iterator(); i.hasNext();)
{
    DataResult oldData = i.next();
    if (data.equalsData(oldData))
    {   
        i.remove();
        break;
    }
}
allResults.add(data)

The odd thing is that i.remove() silently fails (no exception) for some of the items in the set. I've verified

  • The line i.remove() is actually called. I can call it from the debugger directly at the breakpoint in Eclipse and it still fails to change the state of Set

  • DataResult is an immutable object so it can't have changed after being added to the set originally.

  • The equals and hashCode() methods use @Override to ensure they are the correct methods. Unit tests verify these work.

  • This also fails if I just use a for statement and Set.remove instead. (eg loop through the items, find the item in the list, then call Set.remove(oldData) after the loop).

  • I've tested in JDK 5 and JDK 6.

I thought I must be missing something basic, but after spending some significant time on this my colleague and I are stumped. Any suggestions for things to check?

EDIT:

There have been questions - is DataResult truly immutable. Yes. There are no setters. And when the Date object is retrieved (which is a mutable object), it is done by creating a copy.

public Date getEntryTime()
{
    return DateUtil.copyDate(entryTime);
}

public static Date copyDate(Date date)
{
    return (date == null) ? null : new Date(date.getTime());
}

FURTHER EDIT (some time later): For the record -- DataResult was not immutable! It referenced an object which had a hashcode which changed when persisted to the database (bad practice, I know). It turned out that if a DataResult was created with a transient subobject, and the subobject was persisted, the DataResult hashcode was changed.

Very subtle -- I looked at this many times and didn't notice the lack of immutability.

I was very curious about this one still, and wrote the following test:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
import java.util.Set;

public class HashCodeTest {
    private int hashCode = 0;

    @Override public int hashCode() {
        return hashCode ++;
    }

    public static void main(String[] args) {
        Set<HashCodeTest> set = new HashSet<HashCodeTest>();

        set.add(new HashCodeTest());
        System.out.println(set.size());
        for (Iterator<HashCodeTest> iter = set.iterator();
                iter.hasNext();) {
            iter.next();
            iter.remove();
        }
        System.out.println(set.size());
    }
}

which results in:

1
1

If the hashCode() value of an object has changed since it was added to the HashSet, it seems to render the object unremovable.

I'm not sure if that's the problem you're running into, but it's something to look into if you decide to re-visit this.

Under the covers, HashSet uses HashMap, which calls HashMap.removeEntryForKey(Object) when either HashSet.remove(Object) or Iterator.remove() is called. This method uses both hashCode() and equals() to validate that it is removing the proper object from the collection.

If both Iterator.remove() and HashSet.remove(Object) are not working, then something is definitely wrong with your equals() or hashCode() methods. Posting the code for these would be helpful in diagnosis of your issue.

Are you absolutely certain that DataResult is immutable? What is the type of the timestamp? If it's a java.util.Date are you making copies of it when you're initializing the DataResult? Keep in mind that java.util.Date is mutable.

For instance:

Date timestamp = new Date();
DataResult d = new DataResult(timestamp);
System.out.println(d.getTimestamp());
timestamp.setTime(System.currentTimeMillis());
System.out.println(d.getTimestamp());

Would print two different times.

It would also help if you could post some source code.

Thanks for all the help. I suspect the problem must be with equals() and hashCode() as suggested by spencerk. I did check those in my debugger and with unit tests, but I've got to be missing something.

I ended up doing a workaround-- copying all the items except one to a new Set. For kicks, I used Apache Commons CollectionUtils.

    Set<DataResult> tempResults = new HashSet<DataResult>();
    CollectionUtils.select(allResults, 
            new Predicate()
            {
                public boolean evaluate(Object oldData)
                {
                    return !data.equalsData((DataResult) oldData);
                }
            }
            , tempResults);
    allResults = tempResults;

I'm going to stop here-- too much work to simplify down to a simple test case. But the help is miuch appreciated.

You should all be careful of any Java Collection that fetches its children by hashcode, in the case that its child type's hashcode depends on its mutable state. An example:

HashSet<HashSet<?>> or HashSet<AbstaractSet<?>> or HashMap variant:

HashSet retrieves an item by its hashCode, but its item type is a HashSet, and hashSet.hashCode depends on its item's state.

Code for that matter:

HashSet<HashSet<String>> coll = new HashSet<HashSet<String>>();
HashSet<String> set1 = new HashSet<String>();
set1.add("1");
coll.add(set1);
print(set1.hashCode()); //---> will output X
set1.add("2");
print(set1.hashCode()); //---> will output Y
coll.remove(set1) // WILL FAIL TO REMOVE (SILENTLY)

Reason being is HashSet's remove method uses HashMap and it identifies keys by hashCode, while AbstractSet's hashCode is dynamic and depends upon the mutable properties of itself.

Have you tried something like

boolean removed = allResults.remove(oldData)
if (!removed) // COMPLAIN BITTERLY!

In other words, remove the object from the Set and break the loop. That won't cause the Iterator to complain. I don't think this is a long term solution but would probably give you some information about the hashCode , equals and equalsData methods

It's almost certainly the case the hashcodes don't match for the old and new data that are "equals()". I've run into this kind of thing before and you essentially end up spewing hashcodes for every object and the string representation and trying to figure out why the mismatch is happening.

If you're comparing items pre/post database, sometimes it loses the nanoseconds (depending on your DB column type) which can cause hashcodes to change.

The Java HashSet has an issue in "remove()" method. Check the link below. I switched to TreeSet and it works fine. But I need the O(1) time complexity.

https://bugs.openjdk.java.net/browse/JDK-8154740

If there are two entries with the same data, only one of them is replaced... have you accounted for that? And just in case, have you tried another collection data structure that doesn't use a hashcode, say a List?

I'm not up to speed on my Java, but I know that you can't remove an item from a collection when you are iterating over that collection in .NET, although .NET will throw an exception if it catches this. Could this be the problem?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM