简体   繁体   中英

bitwise shift operator under different platforms (windows, mac os, android)

I am debuging a function hashKey . The problem is that it generates different result for the same input under different platforms, windows/win ce, mac os, android. Here is the code:

unsigned long hashKey(const char *name,size_t len)
{
    unsigned long h=(unsigned long)len;
    size_t step = (len>>5)+1;
    for(size_t i=len; i>=step; i-=step)
        h = h ^ ((h<<5)+(h>>2)+(unsigned long)name[i-1]);
    return h;
}

Here is the test program I use:

int main()
{
    char word[] = { 0xE6, 0xBE, 0xB3, 0xE9, 0x96, 0x80, 0xE7, 0x89, 0xB9, 0xE5, 
        0x88, 0xA5, 0xE8, 0xA1, 0x8C, 0xE6, 0x94, 0xBF, 0xE5, 0x8D, 
        0x80, 0x2E, 0x70, 0x6E, 0x67, 0x00};
    // for those who are interested in what the value of variable means, it means
    // "澳門特別行政區.png"

    unsigned int val = hashKey(word, strlen(word));
    printf("hash key for [%s] is [%d].\n", word, (unsigned int)val);
}

The length is 25, the input value is the same, however, the return values are different:

In android, it is 648. In win ce, it is 96, which is the expected value.

I couldn't figure out why. Any help is appreciated. Thanks in advance!

More information:

  1. the different value begins after several interations in the loop, caused by h>>2 . So in the beginning, the values are the same.

  2. it seems input of ansi characters don't have such issue.

Solved (Thanks to Yojimbo's advice) on May 3, 2013.

unsigned long hashKey(const char *name,size_t len)
{
    unsigned long h=(unsigned long)len;
    size_t step = (len>>5)+1;
    for(size_t i=len; i>=step; i-=step)
    {
        unsigned long charVal = (unsigned long)name[i-1];
        if (charVal >= 0x00000080)
            charVal = charVal | 0xffffff80;
        h = h ^ ((h<<5 & 0xffffffe0)+(h>>2 & 0x3fffffff) + charVal);
    }
    return h;
}

Maybe some of the compilers treat "char" as signed, and others don't? Try changing

h = h ^ ((h<<5)+(h>>2)+(unsigned long)name[i-1]);

to

h = h ^ ((h<<5)+(h>>2)+(unsigned long)(name[i-1] & 0xff));

Also, bitwise right shift (your h>>2) may extend the sign bit or not, depending on the whims of the compiler and the machine instruction set.

You're using bitwise shift operators. Are you certain the byte ordering is the same on the processors in question? x86 uses little endian, ARM can be big or little endian.

Also, the size of an int and long can differ. The only rule in C++ is that char <=short <=int <=long <= long long. The exact size isn't defined and can change. A 64 bit processor will have bigger ints and longs than a 32 normally.

You are assuming the size of ints and longs are fixed, but they are not: they vary wildly by platform. https://en.wikipedia.org/wiki/Long_integer#Long_integer

I got a big negative number when I ran that code on a 64bit box. Try including stdint.h and use explicitly sized types like "uint32_t" everywhere that it matters. (Ie a loop that iterates over your array can be "int", but bit manipulation should be a fixed-size type.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM