简体   繁体   English

HashSet 查找复杂度?

[英]HashSet look-up complexity?

A look-up operation OR contains for single can be O(n) in worst-case right?在最坏的情况下,查找操作 OR contains单个可以是O(n)对吗? So, for n elements look up in hashSet will be O(n^2) ?那么,对于n元素,在hashSet中查找将是O(n^2)吗?

Yes, but it's really the worst case: if all the elements in the HashSet have the same hash code (or a hash code leading to the same bucket).是的,但这确实是最坏的情况:如果HashSet中的所有元素都具有相同的 hash 代码(或通向同一个桶的 hash 代码)。 With a correctly written hashCode and a normally distributed key sample, a lookup is O(1).使用正确编写的hashCode和正态分布的密钥样本,查找是 O(1)。

Yes, but the whole reason we have HashSets is that we encounter this worst case with very, very low probability, and it's usually much faster than the guaranteed nlogn for a heap or a (self-balancing) TreeSet, or the guaranteed n^2 for an unsorted list.是的,但是我们拥有 HashSets 的全部原因是我们遇到这种最坏情况的概率非常非常低,而且它通常比保证的 nlogn 堆或(自平衡)TreeSet 或保证的 n^2 快得多对于未排序的列表。

As already noted in the earlier answers, the lookup time complexity is O(1).正如前面的答案中已经提到的,查找时间复杂度是 O(1)。 To make make sure that it's true, just look into a source code for contains() :为了确保它是真实的,只需查看contains()的源代码:

...

private transient HashMap<E,Object> map;

...

public boolean contains(Object o) {
    return map.containsKey(o);
}

...

As you can see, it uses a HashMap object internally, to check if your object exists.如您所见,它在内部使用HashMap object 来检查您的 object 是否存在。

Then, if we take a look into an implementation of contains() for HashMap , then we'll see the code as follows:然后,如果我们看一下HashMapcontains()实现,我们会看到如下代码:

public boolean containsKey(Object key) {
    return getNode(hash(key), key) != null;
}

getNode() searches for a node based on a key hash value and a key value. getNode()根据键 hash 值和键值搜索节点。 Please note that hash(key) has an O(1) time complexity.请注意, hash(key)的时间复杂度为 O(1)。

And finally, for getNode() :最后,对于getNode()

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; 
    Node<K,V> first, e; 
    int n; 
    K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

The most important part is basically the first inner if block:最重要的部分基本上是第一个内部if块:

...
        if (first.hash == hash &&
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
...

If hashes of your object key and that of the first element first are equal, and objects themselves are equal (obviously,), then first is the object we're looking for, and this is O(1).如果您的 object key的哈希值与first一个元素的哈希值相等,并且对象本身相等(显然,),那么first是我们正在寻找的 object,这是 O(1)。

As you can see, it all depends on the implementation of the hash function - if it's good, then it will mostly assign different buckets for different key objects.正如你所看到的,这完全取决于 hash function 的实现——如果它很好,那么它将主要为不同的关键对象分配不同的桶。 If not, then several key objects may reside in the same bucket, and so we will need to do a lookup in the bucket itself to find the right key as seen here:如果不是,那么几个关键对象可能驻留在同一个存储桶中,因此我们需要在存储桶本身中进行查找以找到正确的键,如下所示:

...
        if (first instanceof TreeNode)
            return ((TreeNode<K,V>)first).getTreeNode(hash, key);
        do {
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
...
        } while ((e = e.next) != null); 

However, even in this case, if your bucket is a TreeNode it is O(log(k)) (k - number of elements in the bucket) because it's a balanced binary search tree.但是,即使在这种情况下,如果您的存储桶是TreeNode ,它也是 O(log(k))(k - 存储桶中的元素数),因为它是平衡的二叉搜索树。 If not (the else block), it's O(k).如果不是( else块),则为 O(k)。 But again, this will happen rarely (or maybe even never for some types of objects), and so the average time complexity for one call of the contains method will remain O(1).但同样,这种情况很少发生(对于某些类型的对象甚至可能永远不会发生),因此一次调用contains方法的平均时间复杂度将保持为 O(1)。 Obviously, if you perform n calls then the total time complexity will be linear.显然,如果您执行n次调用,那么总时间复杂度将是线性的。

lookp takes O(c) lookp 需要 O(c)

c = constant value c = 常数值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM