简体   繁体   English

在 C++ 中使用 HashMap 的最佳方法是什么?

[英]What is the best way to use a HashMap in C++?

I know that STL has a HashMap API, but I cannot find any good and thorough documentation with good examples regarding this.我知道 STL 有一个 HashMap API,但我找不到任何好的和详尽的文档以及关于此的好例子。

Any good examples will be appreciated.任何好的例子将不胜感激。

The standard library includes the ordered and the unordered map ( std::map and std::unordered_map ) containers. 标准库包括有序和无序映射( std::mapstd::unordered_map )容器。 In an ordered map the elements are sorted by the key, insert and access is in O(log n) . 在有序映射中,元素按键排序,插入和访问在O(log n)中 Usually the standard library internally uses red black trees for ordered maps. 通常,标准库内部使用红黑树作为有序映射。 But this is just an implementation detail. 但这只是一个实现细节。 In an unordered map insert and access is in O(1). 在无序映射中,插入和访问在O(1)中。 It is just another name for a hashtable. 它只是哈希表的另一个名称。

An example with (ordered) std::map : (有序) std::map的示例:

#include <map>
#include <iostream>
#include <cassert>

int main(int argc, char **argv)
{
  std::map<std::string, int> m;
  m["hello"] = 23;
  // check if key is present
  if (m.find("world") != m.end())
    std::cout << "map contains key world!\n";
  // retrieve
  std::cout << m["hello"] << '\n';
  std::map<std::string, int>::iterator i = m.find("hello");
  assert(i != m.end());
  std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
  return 0;
}

Output: 输出:

23
Key: hello Value: 23

If you need ordering in your container and are fine with the O(log n) runtime then just use std::map . 如果您需要在容器中进行排序并且可以使用O(log n)运行时,那么只需使用std::map

Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map , which has a similar to std::map API (eg in the above example you just have to search and replace map with unordered_map ). 否则,如果你真的需要一个哈希表(O(1)insert / access),请查看std::unordered_map ,它与std::map API类似(例如在上面的例子中你只需要搜索和替换使用unordered_map map )。

The unordered_map container was introduced with the C++11 standard revision. unordered_map容器是随C ++ 11标准版本引入的。 Thus, depending on your compiler, you have to enable C++11 features (eg when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS). 因此,根据您的编译器,您必须启用C ++ 11功能(例如,在使用GCC 4.8时,您必须将-std=c++11添加到CXXFLAGS)。

Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1 . 甚至在C ++ 11发行版GCC支持unordered_map - 在命名空间std::tr1 Thus, for old GCC compilers you can try to use it like this: 因此,对于旧的GCC编译器,您可以尝试使用它:

#include <tr1/unordered_map>

std::tr1::unordered_map<std::string, int> m;

It is also part of boost, ie you can use the corresponding boost-header for better portability. 它也是boost的一部分,即你可以使用相应的boost-header来获得更好的可移植性。

A hash_map is an older, unstandardized version of what for standardization purposes is called an unordered_map (originally in TR1, and included in the standard since C++11). hash_map是用于标准化目的的较旧的非标准化版本,称为unordered_map (最初在TR1中,并且自C ++ 11以来包含在标准中)。 As the name implies, it's different from std::map primarily in being unordered -- if, for example, you iterate through a map from begin() to end() , you get items in order by key 1 , but if you iterate through an unordered_map from begin() to end() , you get items in a more or less arbitrary order. 顾名思义,它与std::map的不同之处主要在于无序 - 例如,如果你遍历从begin()end()的映射,你可以按键1顺序获取项目,但是如果你迭代通过从begin()end()unordered_map ,您可以获得或多或少的任意顺序的项目。

An unordered_map is normally expected to have constant complexity. 通常期望unordered_map具有恒定的复杂性。 That is, an insertion, lookup, etc., typically takes essentially a fixed amount of time, regardless of how many items are in the table. 也就是说,插入,查找等通常基本上花费固定的时间量,而不管表中有多少项。 An std::map has complexity that's logarithmic on the number of items being stored -- which means the time to insert or retrieve an item grows, but quite slowly , as the map grows larger. std::map复杂性与存储的项目数量呈对数关系 - 这意味着插入或检索项目的时间会增长,但随着地图变大, 速度会变慢 For example, if it takes 1 microsecond to lookup one of 1 million items, then you can expect it to take around 2 microseconds to lookup one of 2 million items, 3 microseconds for one of 4 million items, 4 microseconds for one of 8 million items, etc. 例如,如果查找100万个项目中的一个需要1微秒,那么您可以预期查找200万个项目中的一个需要大约2微秒,400万个项目中的一个项目需要3微秒,800万个项目中的一个项目需要4微秒物品等

From a practical viewpoint, that's not really the whole story though. 从实际的角度来看,这并非真正的整个故事。 By nature, a simple hash table has a fixed size. 本质上,简单的哈希表具有固定的大小。 Adapting it to the variable-size requirements for a general purpose container is somewhat non-trivial. 使其适应通用容器的可变大小要求有点不重要。 As a result, operations that (potentially) grow the table (eg, insertion) are potentially relatively slow (that is, most are fairly fast, but periodically one will be much slower). 结果,(可能)增长表(例如,插入)的操作可能相对较慢(即,大多数相当快,但周期性地会慢得多)。 Lookups, which cannot change the size of the table, are generally much faster. 查找不能改变表的大小,通常要快得多。 As a result, most hash-based tables tend to be at their best when you do a lot of lookups compared to the number of insertions. 因此,与插入次数相比,当您执行大量查找时,大多数基于散列的表往往处于最佳状态。 For situations where you insert a lot of data, then iterate through the table once to retrieve results (eg, counting the number of unique words in a file) chances are that an std::map will be just as fast, and quite possibly even faster (but, again, the computational complexity is different, so that can also depend on the number of unique words in the file). 对于插入大量数据的情况,然后遍历表一次以检索结果(例如,计算文件中唯一单词的数量), std::map可能同样快,甚至可能甚至更快(但同样,计算复杂性不同,因此也可以取决于文件中唯一字的数量)。


1 Where the order is defined by the third template parameter when you create the map, std::less<T> by default. 1创建地图时,顺序由第三个模板参数定义,默认情况下为std::less<T>

Here's a more complete and flexible example that doesn't omit necessary includes to generate compilation errors: 这是一个更完整,更灵活的示例,不会忽略生成编译错误所必需的包含:

#include <iostream>
#include <unordered_map>

class Hashtable {
    std::unordered_map<const void *, const void *> htmap;

public:
    void put(const void *key, const void *value) {
            htmap[key] = value;
    }

    const void *get(const void *key) {
            return htmap[key];
    }

};

int main() {
    Hashtable ht;
    ht.put("Bob", "Dylan");
    int one = 1;
    ht.put("one", &one);
    std::cout << (char *)ht.get("Bob") << "; " << *(int *)ht.get("one");
}

Still not particularly useful for keys, unless they are predefined as pointers, because a matching value won't do! 对于键仍然没有特别有用,除非它们被预定义为指针,因为匹配值不会做! (However, since I normally use strings for keys, substituting "string" for "const void *" in the declaration of the key should resolve this problem.) (但是,由于我通常使用字符串作为键,因此在键的声明中将“string”替换为“const void *”应解决此问题。)

Evidence that std::unordered_map uses a hash map in GCC stdlibc++ 6.4 有证据表明std::unordered_map在GCC stdlibc ++ 6.4中使用了哈希映射

This was mentioned at: https://stackoverflow.com/a/3578247/895245 but in the following answer: What data structure is inside std::map in C++? 这在以下网址提到: https//stackoverflow.com/a/3578247/895245但是在下面的答案中: 在C ++中的std :: map里面有什么数据结构? I have given further evidence of such for the GCC stdlibc++ 6.4 implementation by: 我通过以下方式为GCC stdlibc ++ 6.4实现提供了进一步的证据:

  • GDB step debugging into the class GDB步骤调试进入类
  • performance characteristic analysis 性能特征分析

Here is a preview of the performance characteristic graph described in that answer: 以下是该答案中描述的性能特征图的预览:

在此输入图像描述

How to use a custom class and hash function with unordered_map 如何使用unordered_map自定义类和哈希函数

This answer nails it: C++ unordered_map using a custom class type as the key 这个答案指出: C ++ unordered_map使用自定义类类型作为键

Excerpt: equality: 摘录:平等:

struct Key
{
  std::string first;
  std::string second;
  int         third;

  bool operator==(const Key &other) const
  { return (first == other.first
            && second == other.second
            && third == other.third);
  }
};

Hash function: 哈希函数:

namespace std {

  template <>
  struct hash<Key>
  {
    std::size_t operator()(const Key& k) const
    {
      using std::size_t;
      using std::hash;
      using std::string;

      // Compute individual hash values for first,
      // second and third and combine them using XOR
      // and bit shifting:

      return ((hash<string>()(k.first)
               ^ (hash<string>()(k.second) << 1)) >> 1)
               ^ (hash<int>()(k.third) << 1);
    }
  };

}

For those of us trying to figure out how to hash our own classes whilst still using the standard template, there is a simple solution:对于我们这些试图弄清楚如何 hash 我们自己的类同时仍然使用标准模板的人来说,有一个简单的解决方案:

  1. In your class you need to define an equality operator overload == .在您的 class 中,您需要定义一个相等运算符重载== If you don't know how to do this, GeeksforGeeks has a great tutorial https://www.geeksforgeeks.org/operator-overloading-c/如果您不知道如何操作,GeeksforGeeks 有一个很棒的教程https://www.geeksforgeeks.org/operator-overloading-c/

  2. Under the standard namespace, declare a template struct called hash with your classname as the type (see below).在标准命名空间下,声明一个名为 hash 的模板结构,并将您的类名作为类型(见下文)。 I found a great blogpost that also shows an example of calculating hashes using XOR and bitshifting, but that's outside the scope of this question, but it also includes detailed instructions on how to accomplish using hash functions as well https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/我发现了一篇很棒的博文,其中还显示了使用异或和位移计算哈希的示例,但这不在这个问题的 scope 范围内,但它还包含有关如何使用 hash 函数以及https://prateekvjoshi.com/完成的详细说明2014/06/05/使用-hash-function-in-c-for-user-defined-classes/

namespace std {

  template<>
  struct hash<my_type> {
    size_t operator()(const my_type& k) {
      // Do your hash function here
      ...
    }
  };

}
  1. So then to implement a hashtable using your new hash function, you just have to create a std::map or std::unordered_map just like you would normally do and use my_type as the key, the standard library will automatically use the hash function you defined before (in step 2) to hash your keys.因此,要使用新的 hash function 实现哈希表,您只需创建一个std::mapstd::unordered_map就像您通常做的那样并使用my_type作为键,标准库将自动使用 hash function you之前(在第 2 步中)定义到 hash 您的密钥。
#include <unordered_map>

int main() {
  std::unordered_map<my_type, other_type> my_map;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM