简体   繁体   English

Java HashSet与HashMap

[英]Java HashSet vs HashMap

I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. 我了解HashSet基于HashMap实现,但是在需要唯一的元素集时使用。 So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? 那么,为什么在下一个代码中将相同的对象放入地图并进行设置时,两个集合的大小都等于1? Shouldn't map size be 2? 地图大小不应该为2吗? Because if size of both collection is equal I don't see any difference of using this two collections. 因为如果两个集合的大小相等,那么使用这两个集合不会有任何区别。

    Set testSet = new HashSet<SimpleObject>();
    Map testMap = new HashMap<Integer, SimpleObject>(); 

    SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
    SimpleObject simplObject2 = new SimpleObject("Igor", 1);
    testSet.add(simpleObject1);
    testSet.add(simplObject2);


    Integer key = new Integer(10);

    testMap.put(key, simpleObject1);
    testMap.put(key, simplObject2);

    System.out.println(testSet.size());
    System.out.println(testMap.size());

The output is 1 and 1. 输出为1和1。

SimpleObject code

public class SimpleObject {

private String dataField1;
private int dataField2;

public SimpleObject(){}

public SimpleObject(String data1, int data2){
    this.dataField1 = data1;
    this.dataField2 = data2;
}

public String getDataField1() {
    return dataField1;
}

public int getDataField2() {
    return dataField2;
}

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result
            + ((dataField1 == null) ? 0 : dataField1.hashCode());
    result = prime * result + dataField2;
    return result;
}

@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    SimpleObject other = (SimpleObject) obj;
    if (dataField1 == null) {
        if (other.dataField1 != null)
            return false;
    } else if (!dataField1.equals(other.dataField1))
        return false;
    if (dataField2 != other.dataField2)
        return false;
    return true;
 }
}

The map holds unique keys. 该地图拥有唯一键。 When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. 当您使用映射中存在的键调用put ,该键下的对象将被新对象替换。 Hence the size 1. 因此大小为1。

The difference between the two should be obvious: 两者之间的区别应该很明显:

  • in a Map you store key-value pairs Map您存储键值对
  • in a Set you store only the keys Set您仅存储密钥

In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object() . 实际上, HashSet具有HashMap字段,每当调用add(obj)时,都会在基础地图map.put(obj, DUMMY)上调用put方法-其中,虚拟对象是private static final Object DUMMY = new Object() So the map is populated with your object as key, and a value that is of no interest. 因此,在地图中以您的对象作为键填充了该值,而这个值没有意义。

A key in a Map can only map to a single value. Map的键只能映射到单个值。 So the second time you put in to the map with the same key, it overwrites the first entry. 所以你第二次put到地图使用相同的密钥,它覆盖的第一项。

In case of the HashSet, adding the same object will be more or less a no-op. 对于HashSet,添加相同的对象或多或少是无操作的。 In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. 如果是HashMap,则将新的键,值对与现有的键放在一起将覆盖现有的值,以为该键设置新的值。 Below I've added equals() checks to your code: 下面,我在您的代码中添加了equals()检查:

SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);


Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());

I just wanted to add to these great answers, the answer to your last dilemma. 我只是想在这些绝妙的答案中添加最后一个难题的答案。 You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. 您想知道这两个集合之间的区别是什么,如果它们在插入后返回相同的大小。 Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. 好吧,您实际上看不到其中的区别,因为您要在地图中使用相同的键插入两个值,从而用第二个值更改第一个值。 You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key . 如果您在地图中插入了相同的值 ,但是使用了不同的key ,您将看到真正的区别(以及其他)。 Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys , and in the set you can't have duplicate values . 然后,您会看到映射中可以重复的值 ,但是不能重复的键 ,而在集合中则不能重复的值 This is the main difference here. 这是这里的主要区别。

Answer is simple because it is nature of HashSets. 答案很简单,因为它是HashSets的本质。 HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object. HashSet在内部使用HashMap,并将伪对象PRESENT作为值,并且此哈希图的KEY将成为您的对象。

hash(simpleObject1) and hash(simplObject2) will return the same int. hash(simpleObject1)和hash(simplObject2)将返回相同的int。 So? 所以?

When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. 当您将simpleObject1添加到哈希集时,它将以simpleObject1作为键将其放入其内部哈希图中。 Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key. 然后,当您添加(simplObject2)时,您将得到false,因为它已在内部哈希图中用作键。

As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. 作为一点额外的信息,HashSet通过使用对象的equals()和hashCode()协定有效地使用了哈希函数来提供O(1)性能。 That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object. 这就是为什么hashset不允许“ null”的原因,该null无法实现对非对象的equals()和hashCode()。

I think the major difference is, HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. 我认为主要区别在于,HashSet在某种意义上是稳定的,它不会替换重复值(如果在插入第一个唯一键之后发现,则将所有将来的重复值丢弃),并且HashMap将努力用新的重复值替换旧值。 So there must be overhead in HashMap of inserting new duplicate item. 因此,HashMap中必须有插入新重复项的开销。

public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). 此类实现Set接口,该接口由哈希表(实际上是HashMap实例)支持。 It makes no guarantees as to the iteration order of the set; 它不保证集合的迭代顺序。 in particular, it does not guarantee that the order will remain constant over time. 特别是,它不能保证顺序会随着时间的推移保持恒定。 This class permits the null element. 此类允许使用null元素。

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. 该类为基本操作(添加,删除,包含和大小)提供恒定的时间性能,假设哈希函数将元素正确地分散在存储桶中。 Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). 遍历此集合需要的时间与HashSet实例的大小(元素的数量)加上后备HashMap实例的“容量”(存储桶的数量)之和成比例。 Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important. 因此,如果迭代性能很重要,则不要将初始容量设置得过高(或负载因子过低),这一点非常重要。

Note that this implementation is not synchronized. 请注意,此实现未同步。 If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. 如果多个线程同时访问哈希集,并且至少有一个线程修改了哈希集,则必须在外部对其进行同步。 This is typically accomplished by synchronizing on some object that naturally encapsulates the set. 这通常是通过对自然封装了该集合的某个对象进行同步来实现的。 If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. 如果不存在这样的对象,则应使用Collections.synchronizedSet方法将其“包装”。 This is best done at creation time, to prevent accidental unsynchronized access to the set More Details 最好在创建时完成此操作,以防止意外同步访问集。 更多详细信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM