简体   繁体   English

相同的密钥,std :: unordered_map的多个条目?

[英]Same key, multiple entries for std::unordered_map?

I have a map inserting multiple values with the same key of C string type. 我有一个地图插入多个值与C string类型相同的键。

I would expect to have a single entry with the specified key. 我希望有一个具有指定键的条目。

However the map seems to take it's address into consideration when uniquely identifying a key. 然而,当唯一地识别密钥时,地图似乎将其地址考虑在内。

#include <cassert>
#include <iostream>
#include <string>
#include <unordered_map>

typedef char const* const MyKey;

/// @brief Hash function for StatementMap keys
///
/// Delegates to std::hash<std::string>.
struct MyMapHash {
public:
    size_t operator()(MyKey& key) const {
        return std::hash<std::string>{}(std::string(key));
    }
};

typedef std::unordered_map<MyKey, int, MyMapHash> MyMap;

int main()
{
    // Build std::strings to prevent optimizations on the addresses of
    // underlying C strings.
    std::string key1_s = "same";
    std::string key2_s = "same";
    MyKey key1 = key1_s.c_str();
    MyKey key2 = key2_s.c_str();

    // Make sure addresses are different.
    assert(key1 != key2);

    // Make sure hashes are identical.
    assert(MyMapHash{}(key1) == MyMapHash{}(key2));

    // Insert two values with the same key.
    MyMap map;
    map.insert({key1, 1});
    map.insert({key2, 2});

    // Make sure we find them in the map.
    auto it1 = map.find(key1);
    auto it2 = map.find(key2);
    assert(it1 != map.end());
    assert(it2 != map.end());

    // Get values.
    int value1 = it1->second;
    int value2 = it2->second;

    // The first one of any of these asserts fails. Why is there not only one
    // entry in the map?
    assert(value1 == value2);
    assert(map.size() == 1u);
}

A print in the debugger shows that map contains two elements just after inserting them. 调试器中的打印显示该映射在插入之后包含两个元素。

(gdb) p map
$4 = std::unordered_map with 2 elements = {
  [0x7fffffffda20 "same"] = 2,
  [0x7fffffffda00 "same"] = 1
}

Why does this happen if the hash function which delegates to std::hash<std::string> only takes it's value into account (this is asserted in the code)? 如果委托给std::hash<std::string>的哈希函数只考虑它的值(这在代码中断言),为什么会发生这种情况呢?

Moreover, if this is the intended behaviour, how can I use a map with C string as key, but with a 1:1 key-value mapping? 此外,如果这是预期的行为,我如何使用带有C字符串的映射作为键,但使用1:1键值映射?

The reason is that hash maps (like std::unordered_map ) do not only rely on the hash function for determining if two keys are equal. 原因是哈希映射(如std::unordered_map )不仅依赖于哈希函数来确定两个键是否相等。 The hash function is the first comparison layer, after that the elements are always also compared by value. 哈希函数是第一个比较层,之后元素也总是按值进行比较。 The reason is that even with good hash functions you might have collisions where two different keys yield the same hash value - but you still need to be able to save both entries in the hashmap. 原因是即使具有良好的散列函数,您可能会发生冲突,其中两个不同的键产生相同的散列值 - 但您仍然需要能够在散列映射中保存这两个条目。 There are various strategies to handle that, you can find more information on looking for collision resolution for hash maps. 有各种策略可以处理,您可以找到有关查找哈希映射的冲突解决方案的更多信息。

In your examples both entries have the same hash value but different values. 在您的示例中,两个条目具有相同的哈希值但值不同。 The values are just compared by the standard comparison function, which compares the char* pointers, which are different. 这些值只是通过标准比较函数进行比较,该函数比较不同的char*指针。 Therefore the value comparison fails and you get two entries in the map. 因此,值比较失败,您在地图中得到两个条目。 To solve your issue you also need to define a custom equality function for your hash map, which can be done by specifiying the fourth template parameter KeyEqual for std::unordered_map . 要解决您的问题,您还需要为哈希映射定义自定义相等函数,这可以通过为std::unordered_map指定第四个模板参数KeyEqual来完成。

This fails because the unordered_map does not and cannot solely rely on the hash function for the key to differentiate keys, but it must also compare keys with the same hash for equality. 这失败是因为unordered_map没有并且不能仅仅依赖于键的哈希函数来区分键,但它还必须将具有相同哈希的键与相等性进行比较。 And comparing two char pointers compares the address pointed to. 比较两个char指针比较指向的地址。

If you want to change the comparison, pass a KeyEqual parameter to the map in addition to the hash. 如果要更改比较,除了哈希之外,还要将KeyEqual参数传递给映射。

struct MyKeyEqual
{
    bool operator()(MyKey const &lhs, MyKey const &rhs) const
    {
        return std::strcmp(lhs, rhs) == 0;
    }
};

unordered_map needs to be able to perform two operations on the key - checking equality, and obtaining hash code. unordered_map需要能够对密钥执行两个操作 - 检查相等性并获取哈希代码。 Naturally, two unequal keys are allowed to have different hash codes. 当然,允许两个不相等的密钥具有不同的哈希码。 When this happens, unordered map applies hash collision resolution strategy to treat these unequal keys as distinct. 发生这种情况时,无序映射应用哈希冲突解决策略将这些不等密钥视为不同。

That is precisely what happens when you supply a character pointer for the key, and provide an implementation of hash to it: the default equality comparison for pointers kicks in, so two different pointers produce two different keys, even though the content of the corresponding C strings is the same. 这正是当你为键提供一个字符指针时发生的事情,并为它提供一个哈希的实现:指针的默认相等比较开始,所以两个不同的指针产生两个不同的键,即使相应的C的内容字符串是一样的。

You can fix it by providing a custom implementation of KeyEqual template parameter to perform actual comparison of C strings, for example, by calling strcmp : 您可以通过提供KeyEqual模板参数的自定义实现来修复它,以执行C字符串的实际比较,例如,通过调用strcmp

return !strcmp(lhsKey, rhsKey);

You didn't define a map of keys but a map of pointers to a key. 您没有定义键的映射,而是定义键的指针映射。

typedef char const* const MyKey;

The compiler can optimize the two instances of "name" and use only one instance in the const data segment, but that can happen or not. 编译器可以优化"name"的两个实例,并且只在const数据段中使用一个实例,但这可能发生与否。 Aka undefined behavior. Aka未定义的行为。

Your map should contain the key itself. 您的地图应包含密钥本身。 Make the key a std::string or similar. 将密钥设为std::string或类似密钥。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM