[英]Map/ArrayList: which one is faster to search for an element
I have a gigantic data set which I've to store into a collection and need to find whethere any duplicates in there or not. 我有一个巨大的数据集,我将其存储到一个集合中,需要找到那里有没有重复。
The data size could be more than 1 million. 数据大小可能超过100万。 I know I can store more element in ArrayList
comapre to a Map
. 我知道我可以在ArrayList
comapre中存储更多元素到Map
。
My questions are: 我的问题是:
Map
faster than searching in sorted ArrayList
? 是否比在排序的ArrayList
搜索更快地搜索Map
键? HashMap
is faster than TreeMap
? 正在搜索HashMap
Key比TreeMap
快吗? n
elements, which would be more efficient between a TreeMap
and a HashMap
implementation? 仅在存储n
元素所需的空间方面,这在TreeMap
和HashMap
实现之间更有效? 1) Yes. 1)是的。 Searching an ArrayList
is O(n) on average. 搜索ArrayList
平均为O(n)。 The performance of key lookups in a Map depends on the specific implementation. Map中键查找的性能取决于具体实现。 You could write an implementation of Map
that is O(n) or worse if you really wanted to, but all the implementations in the standard library are faster than O(n). 如果你真的想要,可以编写一个O(n)或更差的Map
实现,但标准库中的所有实现都比O(n)快。
2) Yes. 2)是的。 HashMap
is O(1) on average for simple key lookups. 对于简单的键查找, HashMap
平均为O(1)。 TreeMap
is O(log(n)). TreeMap
是O(log(n))。
Class HashMap<K,V>
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. 假设散列函数在桶之间正确地分散元素,该实现为基本操作(get和put)提供了恒定时间性能。
Class TreeMap<K,V>
This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations. 此实现为containsKey,get,put和remove操作提供了有保证的log(n)时间成本。 Algorithms are adaptations of those in Cormen, Leiserson, and Rivest's Introduction to Algorithms. 算法是Cormen,Leiserson和Rivest的算法导论中的算法的改编。
3) The space requirements will be O(n) in both cases. 3)两种情况下的空间要求均为O(n)。 I'd guess the TreeMap
requires slightly more space, but only by a constant factor. 我猜想 TreeMap
需要更多的空间,但只能用一个常数因子。
Map
you're using. 这取决于您使用的Map
类型。 HashMap
has a constant-time average lookup ( O(1) ), while a TreeMap
's average lookup time is based on the depth of the tree ( O(log(n)) ), so a HashMap
is faster. HashMap
具有恒定时间平均查找( O(1) ),而TreeMap
的平均查找时间基于树的深度( O(log(n)) ),因此HashMap
更快。 It just did some benchmark testing on lookup performance between hashmap and sorted arraylist. 它只是对hashmap和已排序的arraylist之间的查找性能进行了一些基准测试。 The answer is hashmap is much faster as the size increase. 答案是hashmap随着大小的增加而快得多。 I am talking about 10x, 20x, 30x faster. 我说的速度提高10倍,20倍,30倍。 I did some test with 1 million of entries using sorted array list and hashmap and the array list get and add operation took seconds to complete, where as the hashmap get and put only takes around 50ms. 我使用排序数组列表和散列映射对100万个条目进行了一些测试,并且数组列表get和add操作需要几秒钟才能完成,其中hashmap get和put只需要大约50ms。
Here are something I found or observed: For sorted arraylist, you would have to sort it first to be able to use the search efficiently (binarySearch for example). 以下是我发现或观察到的内容:对于排序的arraylist,您必须先对其进行排序才能有效地使用搜索(例如binarySearch)。 Practically you don't just have static list (meaning the list will change via add or remove). 实际上,您不仅拥有静态列表(意味着列表将通过添加或删除更改)。 With that in mind you will need to change the add and the get methods to do "binary" operation to make it efficient (like binarySearch). 考虑到这一点,您需要更改add和get方法来执行“二进制”操作以使其高效(如binarySearch)。 So even with binary operation the add and get method will be slower and slower as the list grows. 因此,即使使用二进制操作,随着列表的增长,add和get方法也会越来越慢。 Hashmap on the other hand does not show much of change in term of time in the put and get operation. 另一方面,Hashmap在put和get操作中的时间方面没有显示出太大的变化。 The problem with Hashmap is memory overhead. Hashmap的问题是内存开销。 If you can live with that then go with hashmap. 如果你可以忍受,那就去使用hashmap。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.