[英]how can i hash strings into a specific number of buckets
I'm trying to come up with an algorithm to hash a string into a specific number of buckets but haven't had any luck coming up with ideas on how to do this? 我正在尝试提出一种算法来将字符串哈希到特定数量的桶中,但是对于如何做到这一点却没有任何运气想法?
I have a list of strings like this: 我有一个像这样的字符串列表:
a.jpg A.JPG
b.htm b.htm
c.gif c.gif
d.jpg D.JPG
e.swf e.swf
and i would like to run a function to get a number between 1 and 4 based on the string. 我想运行一个函数来根据字符串得到1到4之间的数字。
egajpg would be 3 egajpg将是3
b.htm would be 2 b.htm将是2
c.gif would be 1 c.gif将是1
etc 等等
it needs to be consistent so if i run the function on a.jpg it always returns 3. 它需要保持一致,所以如果我在a.jpg上运行它总是返回3。
this algorithm would be for splitting resources between servers... 这个算法用于在服务器之间分割资源......
egajpg would be accessed from server3.mydomain.com 可以从server3.mydomain.com访问egajpg
b.htm would be accessed from server2.mydomain.com b.htm将从server2.mydomain.com访问
etc 等等
Does anyone know how I would go about doing this? 有谁知道我会怎么做呢?
Any advice would be much appreciated! 任何建议将不胜感激!
Cheers 干杯
Tim 蒂姆
You may find the following blog post useful. 您可能会发现以下博文有用。 The proposed algorithm is: 提出的算法是:
int bucketIndex = (int)((uint)"d.jpg".GetHashCode() % (uint)buckets.Length);
int bucket = (int)(unchecked(((uint)s.GetHashCode())) % 4 + 1)
(其中s
是字符串)
Standard GetHashCode and % will work: Math.Abs("aaaa".GetHashCode()) % numberOfBuckets
. 标准的GetHashCode和%将起作用: Math.Abs("aaaa".GetHashCode()) % numberOfBuckets
。
EDIT thanks Thomas Levesque for reminding of GetHashCode() returning < 0. Added Math.Abs to have correct code, but versions in other answers are likely work better. 编辑感谢Thomas Levesque提醒GetHashCode()返回<0。添加Math.Abs以获得正确的代码,但其他答案中的版本可能更好。
Use a hash algorithm based on a shared machine key. 使用基于共享计算机密钥的哈希算法。 This will create a unique identifier per string. 这将为每个字符串创建一个唯一标识符。 If you require integers then use a dictionary object to map strings to ints. 如果需要整数,则使用字典对象将字符串映射到整数。 Every time you add a new string set its key to the current dictionary length. 每次添加新字符串时,都将其键设置为当前字典长度。 Finally store the dictionary in a farm based state object such as a shared session so that each site instance can reference it. 最后,将字典存储在基于场的状态对象(如共享会话)中,以便每个站点实例都可以引用它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.