简体   繁体   English

为查找优化的哈希映射

[英]Hash map optimised for lookup

I am looking for some map which has fixed keys (fixed during initialization) and that does faster look-up. 我正在寻找一些具有固定键(在初始化期间固定)并且查找速度更快的地图。 It may not support adding/updating elements later. 它可能不支持以后添加/更新元素。 Is there some algorithm which looks the list of keys and formulates a function so that it is faster to look-up later. 是否有一些算法可以查看键列表并制定一个函数,以便以后查找更快。 In my case, keys are strings. 在我的例子中,键是字符串。

Update: 更新:

Keys are not known at compile time. 密钥在编译时是未知的。 But during initialization time of the application. 但在应用程序的初始化时间。 There wont be any further insertions later but there will be lots of look-ups. 以后不会再进行任何插入,但会有很多查找。 So I want look-ups to be optimized. 所以我想要优化查找。

CMPH may be what you're looking for. CMPH可能是您正在寻找的。 Basically this is gperf without requiring the set at compile-time. 基本上这是gperf 而不需要在编译时设置。

Though of course std::unordered_map as by C++11 might just do too, though possibly with a few collisions. 虽然当然std::unordered_map就像C ++ 11一样可能也会这样做,尽管可能会发生一些冲突。

Since you lookup strings, for strings, a trie (any of the different trie flavours, crit-bit or whatever funky names they have) may also be worthwhile to look into, especially if you have many of them. 因为你查找字符串,对于字符串,trie(任何不同的trie风格,暴击位或任何时髦的名字)也可能值得研究,特别是如果你有很多 There are a lot of free trie implementations freely available. 有许多免费提供的免费trie实现。
The advantage of tries is that they can index-compress strings, so they use less memory, which has a higher likelihood of having data in cache. 尝试的优点是它们可以索引压缩字符串,因此它们使用更少的内存,这使得在缓存中具有数据的可能性更高。 Also the access pattern is less random, which is also cache-friendly. 访问模式也随机性较小,这也是缓存友好的。 A hash table must store the value plus the hash, and indexes more or less randomly (not randomly , but unpredictably) into memory. 哈希表必须存储该值加上哈希值,并且或多或少随机地(不是随机地 ,但不可预测地)索引到内存中。 A trie/trie-like structure ideally only needs one extra bit that distinguishes a key from its common prefix in each node. 理想情况下,类似trie / trie的结构只需要一个额外的位来区分每个节点中的密钥和它的公共前缀。

(Note by the way that O(log(N)) may quite possibly be faster than O(1) in such a case, because big-O does not consider things like that.) (注意,在这种情况下,O(log(N))可能比O(1)更快,因为big-O不会考虑这样的事情。)

Note that these are distinct things: do you need an upper limit, do you need a fast typical rate, or do you need the fastest lookup ever, no questions asked? 请注意,这些是不同的东西:你需要一个上限,你需要一个快速的典型速率,或者你是否需要最快的查找,没有问题? The last one will cost you, the first two ones may be conflicting goals. 最后一个会花费你,前两个可能是冲突的目标。


You could attempt to create a perfect hash function based on the input (ie one that does not have collisions of the input set). 您可以尝试基于输入创建完美的哈希函数(即没有输入集冲突的哈希函数)。 This is a somehow-solved problem (eg this , this ). 这是一个不知何故,解决问题(例如, 这个这个 )。 However, they usually generate source code and may spend significant time generating the hash function. 但是,它们通常会生成源代码,并且可能会花费大量时间生成散列函数。

A modification of this would be using a generic hash function (eg shift-multiply-add) and do a brute force search over suitable parameters. 对此的修改将使用通用散列函数(例如,shift-multiply-add)并对合适的参数进行强力搜索。

This has to be traded off with the cost of a few string comparisons (which aren't that terribly expensive if you don't have to collate). 这必须以少量字符串比较的成本进行交易(如果你不需要整理,这不是非常昂贵的)。

Another option is to use two distinct hash functions - this increases the cost of a single lookup but makes degradation slightly less likely than aliens stealing your clock cylces. 另一个选择是使用两个不同的散列函数 - 这会增加单个查找的成本,但与外星人窃取你的时钟周期相比,降级的可能性略小。 It is rather unlikely that this would be a problem with typical strings and a decent hash function. 这不太可能是典型字符串和一个体面的散列函数的问题。

Try google-sparsehash: http://code.google.com/p/google-sparsehash/ 试试google-sparsehash: http//code.google.com/p/google-sparsehash/

An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! 
The SparseHash library contains several hash-map implementations, including 
implementations that optimize for space or speed.

In a similar topic ((number of) items known at compile time) , I produced this one: Lookups on known set of integer keys . 在类似的主题(编译时已知的(项目数)项)中,我制作了这个: 查看已知的整数键集 Low overhead, no need for perfect hash. 开销低,不需要完美的哈希。 Fortunately, it is in C ;-) 幸运的是,它在C ;-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM