简体   繁体   English

C++ 以 char* 为键的 unordered_map

[英]C++ unordered_map with char* as key

I feel exhausted when trying to use the container unordered_map with char* as the key (on Windows, I am using VS 2010).尝试使用以char*为键的容器unordered_map时,我感到筋疲力尽(在 Windows 上,我使用的是 VS 2010)。 I know that I have to define my own compare function for char* , which inherits from binary_function .我知道我必须为char*定义自己的比较 function ,它继承自binary_function The following is a sample program.以下是示例程序。

#include<unordered_map>
#include <iostream>
#include <string>
using namespace std;

template <class _Tp>  
struct my_equal_to : public binary_function<_Tp, _Tp, bool>  
{  
    bool operator()(const _Tp& __x, const _Tp& __y) const  
    { return strcmp( __x, __y ) == 0; }  
};

typedef unordered_map<char*, unsigned int, ::std::tr1::hash<char*>,  my_equal_to<char*> > my_unordered_map;
//typedef unordered_map<string, unsigned int > my_unordered_map;

my_unordered_map location_map;

int main(){
    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));
    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    printf("map size: %d\n", location_map.size());
    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end())
    {
        printf("found!\n");
    }

    return 0;
} 

I insert the same C string abc twice and look it up.我两次插入相同的 C 字符串abc并查找它。 The second insertion should fail and there will be only one abc in the unordered_map.第二次插入应该会失败,unordered_map 中将只有一个abc However, the output size is 3. It seems that the compare function does not work properly here.然而,output 大小是 3。似乎比较 function 在这里不能正常工作。

Moreover, I get another strange result about the find function, by running the program for many times, the finding result even changes!另外, find function还有一个奇怪的结果,多次运行程序,结果竟然有变化! Sometimes the string abc is found, while the other times abc is not found!有时会找到字符串abc ,而有时找不到abc

Could anyone help me on this?谁能帮我解决这个问题? Your help is very much appreciated!非常感激您的帮忙!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++

Edit: After defining a hash function for char* by my own, the program works properly.编辑:在我自己为char*定义了 hash function 之后,程序运行正常。 The full program code is listed below.下面列出了完整的程序代码。 Thank you all.谢谢你们。

#include<unordered_map>
#include <iostream>
using namespace std;

template <class _Tp>  
struct my_equal_to : public binary_function<_Tp, _Tp, bool>  
{  
    bool operator()(const _Tp& __x, const _Tp& __y) const  
    { return strcmp( __x, __y ) == 0; }  
};


struct Hash_Func{
    //BKDR hash algorithm
    int operator()(char * str)const
    {
        int seed = 131;//31  131 1313 13131131313 etc//
        int hash = 0;
        while(*str)
        {
            hash = (hash * seed) + (*str);
            str ++;
        }

        return hash & (0x7FFFFFFF);
    }
};

typedef unordered_map<char*, unsigned int, Hash_Func,  my_equal_to<char*> > my_unordered_map;


int main(){
    my_unordered_map location_map;

    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));
    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    printf("map size: %d\n", location_map.size());
    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end())
    {
        printf("found!\n");
    }

    return 0;
}

Note: Using char * as the key type for an unordered_map or other STL containers may be dangerous, a safe way (seems to be the only way) is: in the main function, new or malloc a block (eg an array of c strings) on heap and fill it with c strings.注意:使用char * 作为 unordered_map 或其他 STL 容器的键类型可能是危险的,一个安全的方法(似乎是唯一的方法)是:在 main function, newmalloc一个块(例如 c 字符串的数组) 在堆上并用 c 个字符串填充它。 Insert these c strings into unordered_map.将这 c 个字符串插入到 unordered_map 中。 The allocated block of memory is freed at the end of of main function (by delete or free ). memory 的分配块在 main function 的末尾被释放(通过deletefree )。

You comparator is fine (although passing a nullptr is undefined and probably should be handled) 比较器很好(尽管传递nullptr是未定义的,可能应该处理)

The hash, ::std::tr1::hash<char*> is hashing off pointers so each "abc" goes (usually) in a different bucket 散列, ::std::tr1::hash<char*>散列指针,因此每个“abc”(通常)在另一个桶中

You need to write your own hash function that guarantees that hash("abc") always gives the same answer 您需要编写自己的哈希函数,以确保哈希(“abc”)始终给出相同的答案

For now - performance will be terrible, but have a hash that returns 0 - and you should see the second "abc" match the first 现在 - 性能会很糟糕,但是哈希值会返回0 - 你应该看到第二个“abc”匹配第一个

As per comments - using std::string simplifies memory management and provides a library supported hash and comparator, so just std::unordered_map<std::string, X> will work. 根据评论 - 使用std::string简化内存管理并提供库支持的哈希和比较器,因此只需std::unordered_map<std::string, X> This also means that upon deletion of the unordered map all strings will be deallocated for you. 这也意味着在删除unordered map将为您释放所有字符串。 You can even instantiate the std::strings from char arrays on the stack safely. 您甚至可以安全地从堆栈上的char数组中实例化std::strings

If you still want to use char * then you will still need your own comparator and hash, but you can use std::shared_ptr to manage the memory for you (do not use stack instances - do a new char[] ) you will then have a std::unordered_map<shared_ptr<char *>, X> but have no complications later from memory leaks. 如果您仍然想使用char *那么您仍然需要自己的比较器和散列,但是您可以使用std::shared_ptr为您管理内存(不要使用堆栈实例 - 执行new char[] )然后您将有一个std::unordered_map<shared_ptr<char *>, X>但后来没有内存泄漏的复杂情况。

If you still want to use char * you are on the right track, but it is important that you use a memory leak tool like purify or valgrind to make sure that you truly have all the memory management under control. 如果你仍然想使用char *那么你就是在正确的轨道上,但是使用像purify或valgrind这样的内存泄漏工具来确保你真正控制所有的内存管理是很重要的。 (This is generally a good idea for any project) (这通常是任何项目的好主意)

Finally, global variables should be avoided. 最后,应避免全局变量。

Using a char pointer as a key like you are above is almost certainly not what you want to do. 像上面那样使用char指针作为键几乎肯定不是你想要做的。

STL containers deal with stored values, in the case of std::unordered_map<char *, unsigned int, ...> , you are dealing with pointers to c strings, which may not even be around on subsequent insertion/removal checks. STL容器处理存储的值,在std::unordered_map<char *, unsigned int, ...> ,你正在处理指向c字符串的指针,这些字符串甚至可能在后续的插入/删除检查中都没有。

Note that your my_unordered_map is a global variable but you are trying to insert local char arrays a, b, and c. 请注意, my_unordered_map是一个全局变量,但您尝试插入本地char数组a,b和c。 What do you expect your comparison function my_equal_to() to strcmp() when the inserted c strings fall out of scope? 当插入的c字符串超出范围时,你期望你的比较函数my_equal_to()strcmp() (You suddenly have keys pointing to random garbage that can be compared to newly inserted future values.) (您突然有键指向随机垃圾,可以与新插入的未来值进行比较。)

It is important that STL map keys be copyable values that cannot have their meanings changed by external program behavior. 重要的是,STL映射键是可复制的值,不能通过外部程序行为改变其含义。 You should almost certainly use std::string or similar for your key values, even if their construction seems wasteful to you at first glance. 您几乎肯定会使用std::string或类似的键来表示您的键值,即使它们的构造乍一看对您来说也很浪费。

The following will work exactly as you intend things to work above, and is vastly safer: 以下内容与您打算在上面工作的内容完全一致,并且非常安全:

#include <unordered_map>
#include <iostream>
#include <string>

using namespace std;

// STL containers use copy semantics, so don't use pointers for keys!!
typedef unordered_map<std::string, unsigned int> my_unordered_map;

my_unordered_map location_map;

int main() {
    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));

    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    cout << "map size: " << location_map.size() << endl;

    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end()) {
        cout << "found \"" << it->first << "\": " << it->second << endl;
    }

    return 0;
}

(Answer for modern C++, for people still stumbling upon this question) (现代 C++ 的答案,对于仍然绊倒这个问题的人)

These days, if you use C++17 or above, you can use std::string_view as a key in an unordered_map.现在,如果您使用C++17或更高版本,则可以使用std::string_view作为 unordered_map 中的键。

std::string_view only keeps a reference to the raw char* data instead of copying it, allowing you to avoid a copy when you're sure the raw char* data outlives the unordered_map. std::string_view 只保留对原始 char* 数据的引用而不是复制它,当您确定原始 char* 数据比 unordered_map 长时,允许您避免复制。

However, unlike char*, std::string_view implements various methods and operators, like std::hash, making it useful in many more places.然而,与 char* 不同的是,std::string_view 实现了各种方法和运算符,如 std::hash,使其在更多地方有用。

std::unordered_map<std::string_view, unsigned int> my_map;
my_map["some literal"] = 123;
printf("%d\n", my_map["some literal"]);

In the above code, I only put string literals in the map, which is safe.在上面的代码中,我只把字符串字面量放在map中,这是安全的。 Be careful when putting other things in a map with string_view keys - it's your responsibility to ensure they don't get destroyed before the map!将其他东西放入带有 string_view 键的 map 时要小心 - 你有责任确保它们不会在地图之前被摧毁!

When you define something such as "abc" it get assigned a const char*. 当你定义诸如“abc”之类的东西时,它会被赋予一个const char *。 Every time that you write "abc" within your program there is going to be a new memory alocated. 每次在程序中编写“abc”时,都会有一个新的内存。 So: 所以:

const char* x = "abc";
const char* y = "abc";
return x==y;

Will always return false because new memory is alocated each time "abc" is wrriten (sorry if I sound a bit repetitive). 将永远返回false,因为每次“abc”被写入时都会记录新的内存(抱歉,如果我听起来有点重复)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM