简体繁体 English

java.util.HashMap 类的 keySet() 方法的时间复杂度是多少？

[英]What is the time complexity of java.util.HashMap class' keySet() method?

原文 2009-12-05 01:44:04 6 5 java/ algorithm/ hashmap/ complexity-theory

I am trying to implement a plane sweep algorithm and for this I need to know the time complexity of java.util.HashMap class' keySet() method.我正在尝试实现平面扫描算法，为此我需要知道java.util.HashMap 类的 keySet()方法的时间复杂度。 I suspect that it is O(n log n).我怀疑它是 O(n log n)。 Am I correct?我对么？

Point of clarification: I am talking about the time complexity of the keySet() method;澄清点：我说的是keySet()方法的时间复杂度； iterating through the returned Set will take obviously O(n) time.遍历返回的 Set 显然需要 O(n) 时间。

5 个解决方案

Getting the keyset is O(1) and cheap.获取密钥集是O(1)并且便宜。 This is because HashMap.keyset() returns the actual KeySet object associated with the HashMap .这是因为HashMap.keyset()返回与HashMap关联的实际KeySet对象。

The returned Set is not a copy of the keys, but a wrapper for the actual HashMap 's state.返回的Set不是键的副本，而是实际HashMap状态的包装器。 Indeed, if you update the set you can actually change the HashMap 's state;事实上，如果你更新集合，你实际上可以改变HashMap的状态； eg calling clear() on the set will clear the HashMap !例如，在集合上调用clear()将清除HashMap ！

... iterating through the returned Set will take obviously O(n) time. ...遍历返回的Set显然需要O(n)时间。

Actually that is not always true:实际上，这并不总是正确的：

It is true for a HashMap is created using new HashMap<>() . HashMap是使用new HashMap<>() 。 The worst case is to have all N keys land in the same hash chain.最坏的情况是让所有N密钥都位于同一个哈希链中。 However if the map has grown naturally, there will still be N entries and O(N) slots in the hash array.但是，如果映射自然增长，散列数组中仍然会有N个条目和O(N)个槽。 Thus iterating the entry set will involve O(N) operations.因此，迭代条目集将涉及O(N)操作。
It is false if the HashMap is created with new HashMap<>(capacity) and a singularly bad (too large) capacity estimate.如果HashMap是使用new HashMap<>(capacity)和非常糟糕（太大）的capacity估计创建的，则为假。 Then it will take O(Cap) + O(N) operations to iterate the entry set.然后将需要O(Cap) + O(N)操作来迭代条目集。 If we treat Cap as a variable, that is O(max(Cap, N)) , which could be worse than O(N) .如果我们将Cap视为一个变量，那就是O(max(Cap, N)) ，这可能比O(N)更糟糕。

There is an escape clause though.不过有一个转义条款。 Since capacity is an int in the current HashMap API, the upper bound for Cap is 2 ³¹ .由于capacity在当前HashMap API 中是一个int ，因此Cap是 2 ³¹ 。 So for really large values of Cap and N , the complexity is O(N) .所以对于Cap和N非常大的值，复杂度是O(N) 。

On the other hand, N is limited by the amount of memory available and in practice you need a heap in the order of 2 ³⁸ bytes (256GBytes) for N to exceed the largest possible Cap value.另一方面， N受可用内存量的限制，实际上您需要一个数量级为 2 ³⁸字节 (256GBytes) 的堆，以便N超过可能的最大Cap值。 And for a map that size, you would be better off using a hashtable implementation tuned for huge maps.对于这种大小的地图，最好使用针对大型地图进行调整的哈希表实现。 Or not using an excessively large capacity estimate!或者不使用过大的容量估计！

Surely it would be O(1).肯定是O(1)。 All that it is doing is returning a wrapper object on the HashMap.它所做的只是在 HashMap 上返回一个包装对象。

If you are talking about walking over the keyset, then this is O(n), since each next() call is O(1), and this needs to be performed n times.如果您正在谈论遍历键集，那么这是 O(n)，因为每个 next() 调用都是 O(1)，并且这需要执行 n 次。

This should be doable in O(n) time... A hash map is usually implemented as a large bucket array, the bucket's size is (usually) directly proportional to the size of the hash map.这应该在 O(n) 时间内是可行的......哈希映射通常被实现为一个大的存储桶数组，存储桶的大小（通常）与哈希映射的大小成正比。 In order to retrieve the key set, the bucket must be iterated through, and for each set item, the key must be retrieved (either through an intermediate collection or an iterator with direct access to the bucket)...为了检索密钥集，必须迭代存储桶，并且对于每个设置项，必须检索密钥（通过中间集合或直接访问存储桶的迭代器）...

**EDIT: As others have pointed out, the actual keyset() method will run in O(1) time, however, iterating over the keyset or transferring it to a dedicated collection will be an O(n) operation. **编辑：正如其他人所指出的，实际的 keyset() 方法将在 O(1) 时间内运行，但是，迭代键集或将其传输到专用集合将是 O(n) 操作。 Not quite sure which one you are looking for **不太确定您要找哪一个 **

Java collections have a lot of space and thus don't take much time. Java 集合有很多空间，因此不会占用太多时间。 That method is, I believe, O(1).我相信这种方法是 O(1)。 The collection is just sitting there.收藏品只是坐在那里。

To address the "iterating through the returned Set will take obviously O(n) time" comment, this is not actually correct per the doc comments of HashMap :为了解决“遍历返回的 Set 显然需要 O(n) 时间”的注释，根据HashMap的文档注释，这实际上并不正确：

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).迭代集合视图需要的时间与 HashMap 实例的“容量”（桶的数量）加上它的大小（键值映射的数量）成正比。 Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.因此，如果迭代性能很重要，则不要将初始容量设置得太高（或负载因子太低），这一点非常重要。

So in other words, iterating over the returned Set will take O(n + c) where n is the size of the map and c is its capacity, not O(n) .所以换句话说，迭代返回的Set将花费O(n + c) ，其中n是地图的大小， c是它的容量，而不是O(n) 。 If an inappropriately sized initial capacity or load factor were chosen, the value of c could outweigh the actual size of the map in terms of iteration time.如果选择了大小不合适的初始容量或负载因子，则c的值可能会在迭代时间方面超过地图的实际大小。

Tangential answer: 切线答案：

... iterating through the returned Set will take obviously O(n) time. ...迭代返回的Set将显然花费O(n)时间。

Actually that is not always true: 实际上并非总是如此：

It is true for a HashMap is created using new HashMap<>() . 使用new HashMap<>()创建HashMap是正确的。 The worst case is to have all N keys land in the same hash chain. 最糟糕的情况是让所有N密钥都落在同一个哈希链中。 However if the map has grown naturally, there will still be N entries and O(N) slots in the hash array. 但是，如果地图自然增长，则哈希数组中仍将有N个条目和O(N)个插槽。 Thus iterating the entry set will involve O(N) operations. 因此，迭代条目集将涉及O(N)操作。
It is false if the HashMap is created with new HashMap<>(capacity) and a singularly bad (too large) capacity estimate. 如果使用new HashMap<>(capacity)和非常差（太大）的capacity估计创建HashMap ，则为false。 Then it will take O(Cap) + O(N) operations to iterate the entry set. 然后，将采用O(Cap) + O(N)操作来迭代条目集。 If we treat Cap as a variable, that is O(max(Cap, N)) , which could be worse than O(N) . 如果我们将Cap视为变量，那就是O(max(Cap, N)) ，这可能比O(N)更差。

There is an escape clause though. 但是有一个逃避条款。 Since capacity is an int in the current HashMap API, the upper bound for Cap is 2 ³¹ . 由于capacity是当前HashMap API中的int ，因此Cap为2 ³¹ 。 So for really large values of Cap and N , the complexity is O(N) . 因此，对于非常大的Cap和N值，复杂度为O(N) 。

On the other hand, N is limited by the amount of memory available and in practice you need a heap in the order of 2 ³⁸ bytes (256GBytes) for N to exceed the largest possible Cap value. 另一方面， N受可用内存量的限制，实际上你需要一个大约2 ³⁸字节（256GBytes）的堆，以便N超过最大可能的Cap值。 And for a map that size, you would be better off using a hashtable implementation tuned for huge maps. 对于大小的地图，最好使用针对大型地图调整的哈希表实现。 Or not using an excessively large capacity estimate! 或者不使用过大的容量估计！