简体   繁体   English

在C ++中对字典进行哈希处理

[英]hashing a dictionary in C++

hi I want to use a hashmap for words in the dictionary and the indices of the words in the dicionary. 嗨,我想对字典中的单词和字典中的单词索引使用哈希图。

What would be the fastest hash algorithm for this? 最快的哈希算法是什么?

Thanks! 谢谢!

At the bottom of this page there is a section A Note on Hash Functions with some information which you might find useful. 页面底部,有一个关于哈希函数的说明,其中包含一些您可能会觉得有用的信息。

For convenience, I'll just replicate some links here: 为了方便起见,我将在此处复制一些链接:

There are many different hashing algorithms, of varying efficiency, but the most important issue is that it scatter the items fairly uniformly across the different hash buckets. 有许多种不同的哈希算法,它们的效率各不相同,但是最重要的问题是,它在不同的哈希桶中均匀分散了各个项目。

However, you may as well assume that the Microsoft engineers/library engineers have done a decent job of writing an efficient and effective hash algorithm, and just using the built-in libraries/classes. 但是,您可能还假设Microsoft工程师/库工程师在编写高效的哈希算法并且仅使用内置库/类方面做得不错。

The fastest hash function will be 最快的哈希函数将是

template <class T>
size_t hash(T key) {
    return 0;
}

however, though the hashing will be mighty fast, you will suffer performance elsewhere. 但是,尽管散列将非常快速,但是您将在其他地方遭受性能损失。 What you want is to try several hashing algorithms on actual data and see which one actually gives you the best performance in aggregate on the actual data you expect to use if the hashing or lookup is even a performance bottleneck . 您想要的是对实际数据尝试几种哈希算法,看看如果哈希或查找甚至是性能瓶颈 ,那么哪种算法可以在您希望使用的实际数据上总体上为您提供最佳性能。 Until then, go with something handy. 在此之前,请随身携带一些东西。 MD5 is pretty widely available. MD5广泛可用。

Have you tried just using the STL hash_map and seeing if it serves your needs before rolling anything more complex? 您是否尝试过仅使用STL hash_map并查看它是否满足您的需求,然后再滚动更复杂的东西?

http://www.sgi.com/tech/stl/hash_map.html http://www.sgi.com/tech/stl/hash_map.html

boost has a hash function that you can reuse for your own data (predefined for common types). boost有一个哈希函数,您可以将其重用于自己的数据(为常见类型预定义)。 That'd probably work well & fast enough if your needs aren't special. 如果您的需求不是特别的话,那可能效果很好且足够快。

What is your use case? 您的用例是什么? A radix search tree (trie) might be more suitable than a hash if you're mapping from string to integer. 如果要从字符串映射到整数,则基数搜索树 (trie)可能比哈希更合适。 Tries have the advantage of reducing key comparisons for variable length keys. 尝试具有减少可变长度键的键比较的优点。 (eg, strings) (例如,字符串)

Even a binary search tree (eg, STL's map) might be superior to a hash based container in terms of memory use and number of key comparisons. 就内存使用和键比较次数而言,甚至二叉搜索树(例如STL的映射)也可能优于基于哈希的容器。 A hash is more efficient only if you have very few collisions. 仅当冲突很少时,哈希才更有效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM