简体   繁体   English

您如何解释我的 C 哈希函数(Fowler–Noll–Vo_hash_function 类型)的行为?

[英]How would you interpret the behaviour of my C Hash function (type of Fowler–Noll–Vo_hash_function)?

I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.我不明白为什么整数值“哈希”在 3 循环中/之后变得越来越低。

I would guess this happen because the uint limitation is 2,147,483,647 .我猜这是因为 uint 限制是2,147,483,647 BUT... when i try to go step by step the value is equal to 2146134658 ?.但是...当我尝试逐步执行该值时,该值等于2146134658 ?。 I´m not that good in math but it should be lower than the limitation.我的数学不是很好,但它应该低于限制。

#define FNV_PRIME_32 16777619
#define FNV_OFFSET_32 2166136261U

unsigned int hash_function(const char *string, unsigned int size)
{
    unsigned int str_len = strlen(string);

    if (str_len == 0) exit(0);

    unsigned int hash = FNV_OFFSET_32;

    for (unsigned int i = 0; i < str_len; i++)
    {
        hash = hash ^ string[i]; 
        // Multiply by prime number found to work well
        hash = hash * FNV_PRIME_32;
        if (hash > 765010506)
            printf("YO!\n");
        else
            printf("NOO!\n");
    }
    return hash % size;
}  

If you are wondering this if statement is only for me.如果您想知道这个 if 语句仅适用于我。

if (hash > 765010506)
    printf("YO!\n");
else
    printf("NOO!\n");

765010506 is the value for hash after the next run through the loop. 765010506是下一次循环后的哈希值。

I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.我不明白为什么整数值“哈希”在 3 循环中/之后变得越来越低。

All unsigned integer arithmetic in C is modular arithmetic . C 中的所有无符号整数算术都是模算术 For unsigned int , it is modulo UINT_MAX + 1 ;对于unsigned int ,它是UINT_MAX + 1 for unsigned long , modulo ULONG_MAX + 1 , and so on.对于unsigned longULONG_MAX + 1等。

( a modulo m means the remainder of a divided by m ; in C, a % m if both a and m are unsigned integer types.) am表示的其余部分a除以m ;在C, a % m如果两者am是无符号整数类型。)

On many current architectures, unsigned int is a 32-bit unsigned integer type, with UINT_MAX == 4294967295 .在当前的许多架构中, unsigned int是 32 位无符号整数类型, UINT_MAX == 4294967295

Let's look at what this means in practice, for multiplication (by 65520, which happens to be an interesting value; 2 16 - 16):让我们看看这在实践中意味着什么,对于乘法(乘以 65520,这恰好是一个有趣的值;2 16 - 16):

unsigned int  x = 1;
int           i;
for (i = 0; i < 10; i++) {
    printf("%u\n", x);
    x = x * 65520;
}

The output is输出是

1
65520
4292870400
50327552
3221291008
4293918720
16777216
4026531840
0
0

What?什么? How?如何? How come the result ends up zero?结果怎么最终为零? That cannot happen!那不可能发生!

Sure it can.当然可以。 In fact, you can show mathematically that it happens eventually whenever the multiplier is even, and the modulo is with respect to a power of two (2 32 , here).事实上,您可以从数学上证明它最终会在乘数为偶数时发生,并且模数是关于 2 的幂(此处为 2 32 )。

Your particular multiplier is odd, however;但是,您的特定乘数很奇怪; so, it does not suffer from the above.因此,它不会受到上述影响。 However, it still wraps around due to the modulo operation.但是,由于模运算,它仍然环绕。 If we retry the same with your multiplier, 16777619 , and a bit longer sequence,如果我们用你的乘数16777619和更长的序列重试,

unsigned int  x = 1;
int           i;
for (i = 0; i < 20; i++) {
    printf("%u\n", x);
    x = x * 16777619;
}

we get我们得到

1
16777619
637696617
1055306571
1345077009
1185368003
4233492473
878009595
1566662433
558416115
1485291145
3870355883
3549196337
924097827
3631439385
3600621915
878412353
2903379027
3223152297
390634507

In fact, it turns out that this sequence is 1,073,741,824 iterations long (before it repeats itself), and will never yield 0, 2, 4, 5, 6, 7, 8, 10, 12, 13, 14, or 15, for example -- that is, if it starts from 1. It even takes 380 iterations to get a result smaller than 16,777,619 (16,689,137).事实上,事实证明这个序列有 1,073,741,824 次迭代(在它重复之前),并且永远不会产生 0、2、4、5、6、7、8、10、12、13、14 或 15,对于例如——也就是说,如果它从 1 开始。它甚至需要 380 次迭代才能得到小于 16,777,619 (16,689,137) 的结果。

For a hash function, that is okay.对于散列函数,没问题。 Each new nonzero input changes the state, so the sequence is not "locked".每个新的非零输入都会改变状态,因此序列不会“锁定”。 But, there is no reason to expect the hash value increases monotonically as the length of the hashed data increases;但是,没有理由期望哈希值随着哈希数据长度的增加而单调增加; it is much better to assume it is "roughly random" instead: not really random, as it depends on the input only, but also not obviously regular-looking.最好假设它是“大致随机的”:不是真正随机的,因为它仅取决于输入,但也不是明显的常规外观。

I would guess this happen because the uint limitation is 2,147,483,647.我猜这是因为 uint 限制是 2,147,483,647。

The maximum value of a 32-bit unsigned integer is roughly 4 billion (2 32 - 1 = 4,294,967,295). 32 位无符号整数的最大值大约为 40 亿 (2 32 - 1 = 4,294,967,295)。 The number you're thinking of is the maximum value of a signed integer (2 31 - 1).您想到的数字是有符号整数 (2 31 - 1) 的最大值。

2,146,134,658 is slightly less than 2 31 (so it could fit in even an unsigned 32-bit integer), but it's still very close to the limit. 2,146,134,658 略小于 2 31 (因此它甚至可以放入一个无符号的 32 位整数),但它仍然非常接近极限。 Multiplying it by FNV_PRIME_32 -- which is roughly 2 24 -- will give a result of roughly 2 55 , which will cause overflow.将它乘以FNV_PRIME_32 - 大约为 2 24 - 将给出大约 2 55的结果,这将导致溢出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM