简体   繁体   中英

Hash an std::array which used in std::unordered_map

I have an very strange problem about use self-defined hash function in std::unordered_map.

My key type is bigger than int64, so I use std::array to represent it. To get it hash value, I create a MyHash class:

class MyHash
{
public:
    std::size_t operator()(const std::array<char, 12>& oid) const
    {
        Convert t;
        std::memcpy(t.arr, oid.data(), 12);
        std::cout << t.a <<" "<<t.b << std::endl;
        return (std::hash<std::int32_t>()(t.a) ^ (std::hash<std::int64_t>()(t.b) << 1)) >> 1;
    }
    union Convert {
        struct {
            std::int32_t a;
            std::int64_t b;
        };
        char arr[12];
    };
};

First, test it:

std::array<char, 12> arr = {1,2,3,4,5,6,7,8,9,10,11,12};
MyHash o;
o(arr);
o(arr);

It's OK. It print same ta and tb . Now use it with std::unordered_map:

std::unordered_map<std::array<char, 12>, int, MyHash> map;
std::array<char, 12> arr = {1,2,3,4,5,6,7,8,9,10,11,12};
map.insert(std::make_pair(arr, 1));
auto it = map.find(arr);
if(it == map.end())
    std::cout << "error";
else
    std::cout << it->second;

Now, it will print error , the reason is the tb in insert is different with find. And this only happen in vs release mode (or g++ O2)

To avoid undefined behavior, packing and alignment issues, you may copy to individual integers:

#include <cstdint>
#include <cstring>
#include <array>

std::size_t array_hash(const std::array<char, 12>& array) {
    std::uint64_t u64;
    std::memcpy(&u64, array.data(), 8);
    std::uint32_t u32;
    std::memcpy(&u32, array.data() + 8, 4);
    // return (std::hash<std::uint32_t>()(u32) ^ (std::hash<std::uint64_t>()(u64) << 1)) >> 1;;
    return u64 + u32; // for simplicity
}

std::size_t uint_hash(std::uint64_t u64, std::uint32_t u32) {
    // return (std::hash<std::uint32_t>()(u32) ^ (std::hash<std::uint64_t>()(u64) << 1)) >> 1;;
    return u64 + u32; // for simplicity
}

With (g++ version 4.8.4) g++ -S --std=c++11 -O3 you will get:

_Z10array_hashRKSt5arrayIcLm24EE:
.LFB914:
        .cfi_startproc
        movl    8(%rdi), %eax
        addq    (%rdi), %rax
        ret
        .cfi_endproc

and

_Z9uint_hashmj:
.LFB915:
        .cfi_startproc
        movl    %esi, %eax
        addq    %rdi, %rax
        ret
        .cfi_endproc

... which is fairly optimal.

See also: Type Punning, Strict Aliasing, and Optimization

Let's look at this

  union Convert {
        struct {
            std::int32_t a;
            std::int64_t b;
        };
        char arr[12];
    };

The compiler may well pack extra bytes between a and b . So the type punning through the char array will not necessarily overlay the struct part. Type punning is also borderline undefined behaviour in C++; although I think you're OK in this particular instance.

It appears that the packing arrangements for the release build differ from the debug build.

Many compilers allow you to specify the packing arrangements ( #pragma pack ?) but I wouldn't rely on that if I were you since it defeats the compiler's optimisation strategies and is also essentially non-standard C++.

This is a bit of a hack but you could try it and see how it works:

struct MyHash {
    std::size_t operator()(const std::array<char, 12>& oid) const {
        auto d = reinterpret_cast<const std::uint32_t*>(oid.data());
        std::size_t prime = 31;
        std::size_t other_prime = 59;
        return d[2] + other_prime*(d[1] + prime*d[0]);
    }
};

This only works because 12 is a multiple of sizeof(uint32_t) mind you. If the size changes you'll have to adjust.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM