简体繁体 English

Map / ArrayList：搜索元素的速度更快

[英]Map/ArrayList: which one is faster to search for an element

原文 2011-12-09 20:48:46 7 3 java

I have a gigantic data set which I've to store into a collection and need to find whethere any duplicates in there or not. 我有一个巨大的数据集，我将其存储到一个集合中，需要找到那里有没有重复。

The data size could be more than 1 million. 数据大小可能超过100万。 I know I can store more element in ArrayList comapre to a Map . 我知道我可以在ArrayList comapre中存储更多元素到Map 。

My questions are: 我的问题是：

is searching key in a Map faster than searching in sorted ArrayList ? 是否比在排序的ArrayList搜索更快地搜索Map键？
is searching Key in HashMap is faster than TreeMap ? 正在搜索HashMap Key比TreeMap快吗？
Only in terms of space required to store n elements, which would be more efficient between a TreeMap and a HashMap implementation? 仅在存储n元素所需的空间方面，这在TreeMap和HashMap实现之间更有效？

3 个解决方案

1) Yes. 1）是的。 Searching an ArrayList is O(n) on average. 搜索ArrayList平均为O（n）。 The performance of key lookups in a Map depends on the specific implementation. Map中键查找的性能取决于具体实现。 You could write an implementation of Map that is O(n) or worse if you really wanted to, but all the implementations in the standard library are faster than O(n). 如果你真的想要，可以编写一个O（n）或更差的Map实现，但标准库中的所有实现都比O（n）快。

2) Yes. 2）是的。 HashMap is O(1) on average for simple key lookups. 对于简单的键查找， HashMap平均为O（1）。 TreeMap is O(log(n)). TreeMap是O（log（n））。

Class HashMap<K,V>

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. 假设散列函数在桶之间正确地分散元素，该实现为基本操作（get和put）提供了恒定时间性能。

Class TreeMap<K,V>

This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations. 此实现为containsKey，get，put和remove操作提供了有保证的log（n）时间成本。 Algorithms are adaptations of those in Cormen, Leiserson, and Rivest's Introduction to Algorithms. 算法是Cormen，Leiserson和Rivest的算法导论中的算法的改编。

3) The space requirements will be O(n) in both cases. 3）两种情况下的空间要求均为O（n）。 I'd guess the TreeMap requires slightly more space, but only by a constant factor. 我猜想 TreeMap需要更多的空间，但只能用一个常数因子。

It depends on the type of Map you're using. 这取决于您使用的Map类型。
A HashMap has a constant-time average lookup ( O(1) ), while a TreeMap 's average lookup time is based on the depth of the tree ( O(log(n)) ), so a HashMap is faster. HashMap具有恒定时间平均查找（ O（1） ），而TreeMap的平均查找时间基于树的深度（ O（log（n）） ），因此HashMap更快。
The difference is probably moot. 差异可能没有实际意义。 Both data structures require some amount of constant overhead in space complexity by design (both exhibit O(n) space complexity). 两种数据结构都需要通过设计在空间复杂性方面需要一定量的恒定开销（两者都表现出O（n）空间复杂度）。

It just did some benchmark testing on lookup performance between hashmap and sorted arraylist. 它只是对hashmap和已排序的arraylist之间的查找性能进行了一些基准测试。 The answer is hashmap is much faster as the size increase. 答案是hashmap随着大小的增加而快得多。 I am talking about 10x, 20x, 30x faster. 我说的速度提高10倍，20倍，30倍。 I did some test with 1 million of entries using sorted array list and hashmap and the array list get and add operation took seconds to complete, where as the hashmap get and put only takes around 50ms. 我使用排序数组列表和散列映射对100万个条目进行了一些测试，并且数组列表get和add操作需要几秒钟才能完成，其中hashmap get和put只需要大约50ms。
Here are something I found or observed: For sorted arraylist, you would have to sort it first to be able to use the search efficiently (binarySearch for example). 以下是我发现或观察到的内容：对于排序的arraylist，您必须先对其进行排序才能有效地使用搜索（例如binarySearch）。 Practically you don't just have static list (meaning the list will change via add or remove). 实际上，您不仅拥有静态列表（意味着列表将通过添加或删除更改）。 With that in mind you will need to change the add and the get methods to do "binary" operation to make it efficient (like binarySearch). 考虑到这一点，您需要更改add和get方法来执行“二进制”操作以使其高效（如binarySearch）。 So even with binary operation the add and get method will be slower and slower as the list grows. 因此，即使使用二进制操作，随着列表的增长，add和get方法也会越来越慢。 Hashmap on the other hand does not show much of change in term of time in the put and get operation. 另一方面，Hashmap在put和get操作中的时间方面没有显示出太大的变化。 The problem with Hashmap is memory overhead. Hashmap的问题是内存开销。 If you can live with that then go with hashmap. 如果你可以忍受，那就去使用hashmap。