简体   繁体   English

如何在Java中实现确切的C#哈希码

[英]how to implement exact C# hashcode in Java

I have a piece of code generate a signature in C#, and for the sake of convenience, I used hashcode and that was fine. 我有一段代码用C#生成签名,为方便起见,我使用了哈希码,这很好。

however, my boss told me the signature now has to be generated in Java side too. 但是,老板告诉我,签名现在也必须在Java端生成。 this really drives me crazy. 这真的让我发疯。 and i digged .net source code. 我挖了.net源代码。

currently, i only need the hashcode of int, double, string and bool. 目前,我只需要int,double,string和bool的哈希码。 int and bool are easy. int和bool很容易。 the real thing that i can't think of an easy way is double and string. 我不能想到的简单方法是双精度和字符串。 my environment will always be 64-bit. 我的环境将始终是64位。 i have the source in the following: 我在以下来源:

for string: 对于字符串:

        public override int GetHashCode() {

#if FEATURE_RANDOMIZED_STRING_HASHING
            if(HashHelpers.s_UseRandomizedStringHashing)
            {
                return InternalMarvin32HashString(this, this.Length, 0);
            }
#endif // FEATURE_RANDOMIZED_STRING_HASHING

            unsafe {
                fixed (char *src = this) {
                    Contract.Assert(src[this.Length] == '\0', "src[this.Length] == '\\0'");
                    Contract.Assert( ((int)src)%4 == 0, "Managed string should start at 4 bytes boundary");

#if WIN32
                    int hash1 = (5381<<16) + 5381;
#else
                    int hash1 = 5381;
#endif
                    int hash2 = hash1;

#if WIN32
                    // 32 bit machines.
                    int* pint = (int *)src;
                    int len = this.Length;
                    while (len > 2)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
                        pint += 2;
                        len  -= 4;
                    }

                    if (len > 0)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                    }
#else
                    int     c;
                    char *s = src;
                    while ((c = s[0]) != 0) {
                        hash1 = ((hash1 << 5) + hash1) ^ c;
                        c = s[1];
                        if (c == 0)
                            break;
                        hash2 = ((hash2 << 5) + hash2) ^ c;
                        s += 2;
                    }
#endif
#if DEBUG
                    // We want to ensure we can change our hash function daily.
                    // This is perfectly fine as long as you don't persist the
                    // value from GetHashCode to disk or count on String A 
                    // hashing before string B.  Those are bugs in your code.
                    hash1 ^= ThisAssembly.DailyBuildNumber;
#endif
                    return hash1 + (hash2 * 1566083941);
                }
            }
        }

i am not sure about FEATURE_RANDOMIZED_STRING_HASHING (i guess it's not set though), and the pointer casting here: 我不确定FEATURE_RANDOMIZED_STRING_HASHING (我想虽然没有设置),但指针在这里强制转换:

int* pint = (int *)src;

doesn't sound straightforward in java. 在Java中听起来并不简单。

for double: 对于双:

public unsafe override int GetHashCode() {
    double d = m_value;
    if (d == 0) {
        // Ensure that 0 and -0 have the same hash code
        return 0;
    }
    long value = *(long*)(&d);
    return unchecked((int)value) ^ ((int)(value >> 32));
}

the same issue. 同样的问题。 there is a pointer casting, reference and dereference. 有一个指针转换,引用和取消引用。

how can i do that in java(no native code)? 我该如何在Java中执行此操作(无本地代码)?

I wonder if you aren't making it more complicated than it needs to be with the whole unsafe section and pointers. 我想知道您是否没有使整个不安全的部分和指针变得更复杂。 Why don't you start with a solution in java then port it back to C#. 为什么不从Java解决方案开始,然后将其移植回C#。

I bet there are a bunch of solutions on the net for coming up with a hash in Java, and the port from java to C# should be trivial. 我敢打赌,网络上有很多解决方案,可以解决Java中的哈希问题,而且从Java到C#的移植应该很简单。

edit: In fact, I looked it up for you: Good Hash Function for Strings 编辑:事实上,我为您查找: 字符串的良好哈希函数

Please don't assume that pointers are necessary for performance either--using pointers probably stops compiler optimizations causing your code to be slower than if you'd just used arrays/strings like the java solutions above. 请不要同时假设指针对于性能而言是必要的-使用指针可能会停止编译器优化,从而导致代码运行速度比仅使用上面的java解决方案的数组/字符串慢。

In response to comment: If you want the same function between C# and Java you will need a solution that doesn't use pointers. 回应评论:如果要在C#和Java之间使用相同的功能,则需要不使用指针的解决方案。 That solution will probably perform as well or better anyway (because the compiler has more freedom when optimizing it) and will certainly be more readable so if you want to use this solution recode it to do it without pointers first then use it in both the C# and Java versions. 该解决方案无论如何都会表现得更好或更好(因为编译器在优化时具有更大的自由度)并且肯定会更具可读性,因此,如果您想使用此解决方案进行重新编码,请先将其重新编写为无指针,然后再在C#中使用它和Java版本。

If you can't recode it in your primary language--C#--you certainly won't be able to do it in Java. 如果您无法使用您的主要语言C#对其进行重新编码,那么您肯定无法使用Java进行编码。

Maintain compatibility by having good unit test coverage, if you don't have enough unit tests now, write them before making any changes--If you test existing hash codes (You appear to be persisting them somewhere) then you might be able to write some c# tests that will test both the c# and Java hash codes which would also be good to prove that your current effort is successful. 通过良好的单元测试覆盖范围来保持兼容性,如果现在没有足够的单元测试,请在进行任何更改之前将其编写-如果您测试现有的哈希码(您似乎将它们保留在某个地方),则可以编写一些可以同时测试c#和Java哈希码的c#测试,这也可以很好地证明您当前的努力是成功的。

I needed to implement the .NET String GetHashCode in Java because of a port of some code we were doing where there was data dependent on the .NET String GetHashCode. 我需要在Java中实现.NET String GetHashCode,因为我们正在做一些代码的移植,在这些代码中,数据依赖于.NET String GetHashCode。 The solution below is probably naive and definitely not optimized, but I didn't need it to be -- it's called rarely. 下面的解决方案可能是幼稚的,而且绝对没有经过优化,但是我不需要它-很少被称为。 I tested it with the empty string, 1, 2, 3, 4, and 5 character strings, and non-ascii strings. 我用空字符串,1、2、3、4和5字符串以及非ascii字符串进行了测试。 It works for my use cases, but I make no guarantees. 它适用于我的用例,但我不做任何保证。

import java.nio.charset.Charset;

public class NetHashCode {
    public static int getHashCode(String s) {
        int hash1 = (5381<<16) + 5381;
        int hash2 = hash1;
        byte[] bytes = s.getBytes(Charset.forName("UTF-16LE"));
        int numCharsRemaining = s.length();
        // 2 bytes per character, little endian.
        for(int j=0; j< bytes.length; j+=4) {
            int holdsUpToTwoChars;
            if(numCharsRemaining > 1) {
                holdsUpToTwoChars = bytes[j] + (bytes[j+1] << 8) + (bytes[j+2] << 16) + (bytes[j+3] << 24);
                numCharsRemaining -= 2;
            } else {
                holdsUpToTwoChars = bytes[j] + (bytes[j+1] << 8);
                numCharsRemaining -= 1;
            }
            if(j%8 < 4) {
                hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ holdsUpToTwoChars;
            } else {
                hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ holdsUpToTwoChars;
            }
        }
        return hash1 + (hash2 * 1566083941);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM