简体   繁体   English

Java String到短哈希码

[英]Java String to short hash code

I want to "implement" a hash function from Strings to shorts, using the java standard hashCode() function of String object. 我想使用String对象的Java标准hashCode()函数从字符串到短裤“实现”哈希函数。 I came up with the following simple implementation: 我提出了以下简单实现:

static short shortHashCode(String str)
{
   int strHashCode = str.hashCode();
   short shorterHashCode = (short) (strHashCode % Short.MAX_VALUE);
   return shorterHashCode;
}
  1. Is my shortHashCode function a good hash function? 我的shortHashCode函数是一个好的哈希函数吗? Meaning is the chance of collisions small (chance that two different Strings will have the same hash code close to 1/Short.MAX_VALUE) ? 这意味着发生冲突的可能性很小(两个不同的字符串有相同的哈希码接近1 / Short.MAX_VALUE的机会)吗?
  2. Is there a better way to implement hash function from Strings to shorts? 有没有更好的方法来实现从字符串到短裤的哈希函数?
(short) (strHashCode % Short.MAX_VALUE);

is losing information unnecessarily. 正在不必要地丢失信息。

 (short) (strHashCode % ((Short.MAX_VALUE + 1) << 1));

would not, but would be equivalent anyway to 不会,但是等同于

 (short) strHashCode

since casting an integral type to a smaller integral type just truncates the most significant bits. 因为将整数类型转换为较小的整数类型只会截断最高有效位。


It also assumes that all bits have the same entropy, which may not be true. 它还假定所有位具有相同的熵,这可能不是正确的。 You could try and spread the entropy around: 您可以尝试将熵传播到周围:

 (short) (strHashCode ^ (strHashCode >>> 16))

which XORs the high 16 bits with the low 16 bits. 将高16位与低16位进行XOR运算。


Meaning is the chance of collisions small (chance that two different Strings will have the same hash code close to 1/Short.MAX_VALUE) ? 这意味着发生冲突的可能性很小(两个不同的字符串有相同的哈希码接近1 / Short.MAX_VALUE的机会)吗?

java.lang.String.hashCode is not a cryptographically strong hash function , so it only has that property if an attacker can't control one or both inputs to force a collision. java.lang.String.hashCode并不是一种加密功能强大的哈希函数 ,因此只有在攻击者无法控制一个或两个输入来强制发生冲突时,它才具有该属性。

If you expose it to strings from an untrusted source, you might see a much higher rate of hash collisions, possibly allowing an attacker to deny service. 如果将其暴露给来自不受信任来源的字符串,则可能会看到更高的哈希冲突率,这可能使攻击者拒绝提供服务。

Also, it is designed to tradeoff a small increase in collision rate for better performance, and cross-version stability. 此外,它还旨在权衡碰撞率的小幅提高,以获得更好的性能和跨版本稳定性。 There are better string hashing functions out there. 有更好的字符串哈希函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM