简体   繁体   English

为什么使用Hashmap.containsKey比Arrays.binarySearch运行得快得多?

[英]Why using Hashmap.containsKey run faster considerably than Arrays.binarySearch?

I have two lists of phone numbers. 我有两个电话号码清单。 1st list is a subset of 2nd list. 第一个列表是第二个列表的子集。 I ran two different algorithms below to determine which phone numbers are contained in both of two lists. 我在下面运行了两种不同的算法,以确定两个列表中都包含哪些电话号码。

  • Way 1: 方法1:
    • Sortting 1st list: Arrays.sort(FirstList); 排序第一个列表:Arrays.sort(FirstList);
    • Looping 2nd list to find matched element: If Arrays.binarySearch(FistList, 'each of 2nd list') then OK 循环第二个列表以查找匹配的元素:如果是Arrays.binarySearch(FistList,'每个第二个列表'),则确定
  • Way 2: 方式2:
    • Convert 1st list into HashMap with key/valus is ('each of 1st list', Boolean.TRUE) 将第一个列表转换为具有键/值的HashMap('每个第一列表',Boolean.TRUE)
    • Looping 2nd list to find matched element: If FirstList.containsKey('each of 2nd list') then OK 循环第二个列表以找到匹配的元素:如果FirstList.containsKey('第二个列表中的每个'),则确定

It results in Way 2 ran within 5 seconds is faster considerably than Way 1 with 39 seconds. 结果,方法2在5秒内跑完比方法1快39秒。 I can't understand the reason why. 我不明白原因。

I appreciate your any comments. 感谢您的任何评论。

因为哈希是O(1),而二进制搜索是O(log N)

HashMap relies on a very efficient algorithm called 'hashing' which has been in use for many years and is reliable and effective. HashMap依赖于一种非常有效的算法,称为“哈希”,该算法已经使用了多年,并且可靠有效。 Essentially the way it works is to split the items in the collection into much smaller groups which can be accessed extremely quickly. 本质上,它的工作方式是将集合中的项目分成更小的组,可以非常快速地对其进行访问。 Once the group is located a less efficient search mechanism can be used to locate the specific item. 一旦找到组,就可以使用效率较低的搜索机制来查找特定项目。

Identifying the group for an item occurs via an algorithm called a 'hashing function'. 通过称为“散列函数”的算法来识别项目的组。 In Java the hashing method is Object.hashCode() which returns an int representing the group. 在Java中,哈希方法是Object.hashCode() ,该方法返回表示该组的int As long as hashCode is well defined for your class you should expect HashMap to be very efficient which is exactly what you've found. 只要为您的类定义了hashCode ,您就应该期望HashMap非常高效,这正是您所发现的。

There's a very good discussion on the various types of Map and which to use at Difference between HashMap, LinkedHashMap and TreeMap 关于各种Map类型,以及在HashMap,LinkedHashMap和TreeMap之间的区别时使用的Map ,都有很好的讨论

My shorthand rule-of-thumb is to always use HashMap unless you can't define an appropriate hashCode for your keys or the items need to be ordered (either natural or insertion). 我的简化法则是始终使用HashMap除非您无法为键定义适当的hashCode或需要对项进行排序(自然或插入)。

Look at the source code for HashMap: it creates and stores a hash for each added (key, value) pair, then the containsKey() method calculates a hash for the given key, and uses a very fast operation to check if it is already in the map. 查看HashMap的源代码:它为每个添加的(键,值)对创建并存储哈希,然后containsKey()方法为给定键计算哈希,并使用非常快速的操作来检查它是否已经存在在地图上。 So most retrieval operations are very fast. 因此,大多数检索操作都非常快。

Way 1: 方法1:

  • Sorting: around O(nlogn) 排序: O(nlogn)

  • Search: around O(logn) 搜索: O(logn)登录)左右

Way 2: 方式2:

  • Creating HashTable: O(n) for small density (no collisions) 创建HashTable: O(n)用于小密度(无碰撞)

  • Contains: O(1) 包含: O(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM