简体   繁体   English

哈希码不起作用

[英]hash code no working

I found this Hash function written in Java and with some help from stackoverflow converted it to C. The problem is it gives a different hash value each time it runs on the same word. 我发现此Hash函数是用Java编写的,并在stackoverflow的一些帮助下将其转换为C。问题是,每次在同一单词上运行时,它都会给出不同的哈希值。

Here's the original function: 这是原始功能:

long sfold(String s, int M) 
{
  int intLength = s.length() / 4;
  long sum = 0;
  for (int j = 0; j < intLength; j++) 
  {
     char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
     long mult = 1;
     for (int k = 0; k < c.length; k++) 
     {
        sum += c[k] * mult;
        mult *= 256;
     }
  }

  char c[] = s.substring(intLength * 4).toCharArray();
  long mult = 1;
  for (int k = 0; k < c.length; k++) 
  {
     sum += c[k] * mult;
     mult *= 256;
  }

  return(Math.abs(sum) % M);
 }

And here's how we rewrote it: 这是我们重写的方式:

include <stdlib.h>
include <stdio.h>
include <math.h>
include <string.h>

long sfold(char * s, int M);

int main(void)
{
    char *  s = "test";
    int M;
    long x;
    M = 525;
    x = sfold(s,M);

    printf("%ld\n",x);
}   
long sfold(char * s, int M)
{
    int intLength = strlen(s) / 4;
    long sum = 0;

    for (int j = 0; j < intLength; j++) 
    {
       char c[4];
       memcpy(c, s + 4 * j, 4);

       //char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
       long mult = 1;

       for (int k = 0; k < strlen(c); k++) 
       {
           sum += c[k] * mult;
           mult *= 256;
       }
    }

    char c[intLength];
    memcpy(c,s,intLength);
    //char c[] = s.substring(intLength * 4).toCharArray();
    long mult = 1;

    for (int k = 0; k < strlen(c); k++) 
    {
       sum += c[k] * mult;
       mult *= 256;
    }

    return(abs(sum) % M);
}

Shouldn't this give the same value each time we run the program? 每次我们运行程序时,这不应该赋予相同的值吗? Anyone see what's wrong? 有人知道怎么了吗?

All that string copying is really silly. 所有这些字符串复制真的很愚蠢。 What's the point of copying if all you need is the character value? 如果您只需要字符值,复制的目的是什么?

Here's how it might look in C: 这是在C语言中的外观:

long sfold(char* s, unsigned long M) {
   unsigned long mult = 1, sum = 0;
   while (*s) {
      sum += (uint8_t)(*s++) * mult;
      mult *= 256;
      if (!mult) mult = 1;
   }
   return sum % M;
}

But it's a terrible hash algorithm. 但这是一个糟糕的哈希算法。 You'd be better off with a simple modular hash (which is also not great, but it's not as bad): 使用简单的模块化哈希(最好也不错,但还不错),您会更好:

/* This could be any small prime */
static const unsigned long mult = 31;
long sfold(char* s, unsigned long M) {
   /* Avoid having the hash of the empty string be 0 */
   unsigned long sum = 0xBEA00D1FUL;
   while (*s)
      sum += (uint8_t)(*s++) * mult;
   return sum % M;
}

I think I took care of most of the bugs for you. 我想我已经为您解决了大多数错误。 I made it C99 compliant, mainly out of habit. 我使它符合C99,主要是出于习惯。 The major problem was using strlen(c) : c is a character array, not a string (which is a character array terminated with the null '\\0' character). 主要问题是使用strlen(c)c是一个字符数组,而不是字符串(这是一个以空'\\0'字符终止的字符数组)。 You'll need to rewrite your function so that if calloc() / malloc() fails, the function terminates with an error. 您需要重写您的函数,以便如果calloc() / malloc()失败,该函数将以错误终止。 Or you can go back to variable length arrays like you were using before if your compiler supports it. 或者,如果编译器支持,则可以像以前使用的那样返回可变长度数组。 There are likely better hash functions in other posts on StackOverflow , but this at least helps you getting yours working in a deterministic manner without invoking undefined behavior. 在StackOverflow上的其他文章中可能有更好的哈希函数 ,但这至少可以帮助您以确定性的方式工作而无需调用未定义的行为。

Code Listing 代码清单


/******************************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>

#define BUF_SIZE  (4)

/******************************************************************************/
long sfold(const char* s, int M);

/******************************************************************************/
int main(void) {
   const char*  s = "test string";
   int M;
   long x;
   M = 525;
   x = sfold(s,M);

   printf("String:%s - Hash:%ld\n", s, x);
}

/******************************************************************************/
long sfold(const char* s, int M) {
   int intLength = strlen(s) / 4;
   char* c = calloc(intLength, sizeof(char));   /* Warning, test if c==NULL, this
                                                 * call can fail.
                                                 */
   long sum = 0;
   int j, k;

   for (j=0; j<intLength; j++) {
      char c[BUF_SIZE];
      memcpy(c, s + BUF_SIZE * j, BUF_SIZE);

      //char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
      long mult = 1;

      for (k=0; k<BUF_SIZE; k++) {
         sum += c[k] * mult;
         mult *= 256;
      }
   }

   memcpy(c, s, intLength);
   //char c[] = s.substring(intLength * 4).toCharArray();
   long mult = 1;

   for (k=0; k<BUF_SIZE; k++) {
      sum += c[k] * mult;
      mult *= 256;
   }

   free(c);
   return(abs(sum) % M);
}

Sample Output 样本输出


for i in $(seq 1 5); do echo $i; ./a.out; done
1
String:test string - Hash:384
2
String:test string - Hash:384
3
String:test string - Hash:384
4
String:test string - Hash:384
5
String:test string - Hash:384

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM