简体   繁体   English

散列std :: unordered_map中使用的std :: array

[英]Hash an std::array which used in std::unordered_map

I have an very strange problem about use self-defined hash function in std::unordered_map. 关于在std :: unordered_map中使用自定义散列函数,我有一个非常奇怪的问题。

My key type is bigger than int64, so I use std::array to represent it. 我的密钥类型比int64大,所以我使用std :: array来表示它。 To get it hash value, I create a MyHash class: 为了获得它的哈希值,我创建了一个MyHash类:

class MyHash
{
public:
    std::size_t operator()(const std::array<char, 12>& oid) const
    {
        Convert t;
        std::memcpy(t.arr, oid.data(), 12);
        std::cout << t.a <<" "<<t.b << std::endl;
        return (std::hash<std::int32_t>()(t.a) ^ (std::hash<std::int64_t>()(t.b) << 1)) >> 1;
    }
    union Convert {
        struct {
            std::int32_t a;
            std::int64_t b;
        };
        char arr[12];
    };
};

First, test it: 首先,测试一下:

std::array<char, 12> arr = {1,2,3,4,5,6,7,8,9,10,11,12};
MyHash o;
o(arr);
o(arr);

It's OK. 没关系。 It print same ta and tb . 它打印相同的tatb Now use it with std::unordered_map: 现在将它与std :: unordered_map一起使用:

std::unordered_map<std::array<char, 12>, int, MyHash> map;
std::array<char, 12> arr = {1,2,3,4,5,6,7,8,9,10,11,12};
map.insert(std::make_pair(arr, 1));
auto it = map.find(arr);
if(it == map.end())
    std::cout << "error";
else
    std::cout << it->second;

Now, it will print error , the reason is the tb in insert is different with find. 现在,它会打印error ,原因是插入中的tb与find不同。 And this only happen in vs release mode (or g++ O2) 这只发生在vs release模式(或g ++ O2)

To avoid undefined behavior, packing and alignment issues, you may copy to individual integers: 为避免未定义的行为,打包和对齐问题,您可以复制到单个整数:

#include <cstdint>
#include <cstring>
#include <array>

std::size_t array_hash(const std::array<char, 12>& array) {
    std::uint64_t u64;
    std::memcpy(&u64, array.data(), 8);
    std::uint32_t u32;
    std::memcpy(&u32, array.data() + 8, 4);
    // return (std::hash<std::uint32_t>()(u32) ^ (std::hash<std::uint64_t>()(u64) << 1)) >> 1;;
    return u64 + u32; // for simplicity
}

std::size_t uint_hash(std::uint64_t u64, std::uint32_t u32) {
    // return (std::hash<std::uint32_t>()(u32) ^ (std::hash<std::uint64_t>()(u64) << 1)) >> 1;;
    return u64 + u32; // for simplicity
}

With (g++ version 4.8.4) g++ -S --std=c++11 -O3 you will get: 使用(g ++版本4.8.4)g ++ -S --std = c ++ 11 -O3,您将获得:

_Z10array_hashRKSt5arrayIcLm24EE:
.LFB914:
        .cfi_startproc
        movl    8(%rdi), %eax
        addq    (%rdi), %rax
        ret
        .cfi_endproc

and

_Z9uint_hashmj:
.LFB915:
        .cfi_startproc
        movl    %esi, %eax
        addq    %rdi, %rax
        ret
        .cfi_endproc

... which is fairly optimal. ......这是相当理想的。

See also: Type Punning, Strict Aliasing, and Optimization 另请参阅: 键入Punning,Strict Aliasing和Optimization

Let's look at this 我们来看看这个

  union Convert {
        struct {
            std::int32_t a;
            std::int64_t b;
        };
        char arr[12];
    };

The compiler may well pack extra bytes between a and b . 编译器可以很好地在ab之间打包额外的字节。 So the type punning through the char array will not necessarily overlay the struct part. 因此,通过char数组进行打孔的类型不一定会覆盖struct部分。 Type punning is also borderline undefined behaviour in C++; 类型punning也是C ++中的临界未定义行为; although I think you're OK in this particular instance. 虽然我认为你在这个特殊情况下没问题。

It appears that the packing arrangements for the release build differ from the debug build. 似乎发布版本的打包安排与调试版本不同。

Many compilers allow you to specify the packing arrangements ( #pragma pack ?) but I wouldn't rely on that if I were you since it defeats the compiler's optimisation strategies and is also essentially non-standard C++. 许多编译器允许你指定打包安排( #pragma pack ?)但是如果我是你,我不会依赖它,因为它击败了编译器的优化策略,而且基本上也是非标准的C ++。

This is a bit of a hack but you could try it and see how it works: 这有点像黑客,但你可以尝试一下,看看它是如何工作的:

struct MyHash {
    std::size_t operator()(const std::array<char, 12>& oid) const {
        auto d = reinterpret_cast<const std::uint32_t*>(oid.data());
        std::size_t prime = 31;
        std::size_t other_prime = 59;
        return d[2] + other_prime*(d[1] + prime*d[0]);
    }
};

This only works because 12 is a multiple of sizeof(uint32_t) mind you. 这只能起作用,因为12是sizeof(uint32_t)的倍数。 If the size changes you'll have to adjust. 如果尺寸发生变化,您将不得不进行调整。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM