简体   繁体   中英

issues concerning a byte array to a long long(64 bit) array vs a long (32 bit)

I have a byte array that has hex values and I initially put those values in a unsigned long. I am using a 32 bit processor via Ubuntu at the moment. But, i might have to port this program to a 64 bit processor.

now I am aware of strtoul function but since I was able to convert it would any issues via a direct assignment I did not bother with that function. The reason I put it in a unsigned long was because I was thinking about little/big endian issues and so using a register like signed long would just take care of that problem for me regardless of processor. now however, i have been thinking about how my program would work on a 64 bit processor.

since i am on a 32bit processor it might only recognize 32bit long vs a 64 bit processor only recognizing a 64 bit long which would put my signed long array in jeopardy. so, to fix this issue I just made that signed array into long long. Would that address my concerns? or do I need to do something else?

some help and explanation would be appreciated. all my code is in c++.

Instead of using long or long long you should use a typedef like uint32_t , or something similar, so it can be 32-bits on all platforms, unless this isn't what you want?

It seems you do have a potential problem with endianness though, if you are simply doing:

char bytes[4] = {0x12, 0x23, 0xff, 0xed};
long* p_long = reinterpret_cast<long*>(bytes);

std::cout << std::hex << *p_long << std::endl; // prints edff2312 on a little endian platform, 1223ffed on a big endian one.

since the actual value of the bytes when interpreted as an integer will change depending on endianness. There is a good answer on converting endianness here .

You might want to look at SO 2032744 for an example of big-endian vs little-endian issues.

I'm not sure what you mean by using a register would resolve your endian-ness issues. We'd need to see the code to know. However, if you need to transfer integer values over the wire between different machines, you need to be sure that you are handling the size and the byte order correctly. That means both ends must agree on how to handle it - even if they actually do things differently.

Copying a byte array into a 'long' on an Intel platform will produce different results from copying the same array into a 'long' on a SPARC platform. To go via a register, you'd have to use code similar to:

void        st_uint4(Uint4 l, char *s)
{
    s += sizeof(Uint4) - 1;
    *s-- = l & 0xFF;
    l >>= 8;
    *s-- = l & 0xFF;
    l >>= 8;
    *s-- = l & 0xFF;
    l >>= 8;
    *s   = l & 0xFF;
}

Uint4   ld_uint4(const char *s)
{
    int i;
    Uint4   j = 0;

    for (i = 0; i < 4; i++)
    {
        j = (j << 8) | (*s++ & 0xFF);
    }
    return(j);
}

There are multiple ways to write that code.


Addressing the comments:

When dealing with data across machines, you have to be very careful. The two functions shown are inverses of each other. The 'ld_uint4()' function takes a byte array and loads that into a 4-byte signed integer (assuming you have a typedef for Uint4 that maps to a 4-byte signed integer - uint32_t from inttypes.h or stdint.h is a good bet). The st_uint4() function does the reverse operation. This code uses a big-endian storage format (the MSB is first in the byte array), but the same code is used on both types of platform (no performance advantage to either - and no conditional compilation, which is probably more important). You could write the code to work with little-endian storage; you could write the code so that there is less penalty on one type of machine versus the other.

Understanding data layouts on disk is crucial - defining them carefully and in a platform neutral way is also crucial. Handling (single-byte code set) strings is easy; handling wide character strings (UTF-16 or UTF-32) is like handling integers - and you can use code similar to the code above for Uint2 and Uint8 if you wish (I have such functions pre-packaged, for example - I just copied the Uint4 versions; I also have SintN functions - for the copying stuff, the difference is not crucial, but for memory comparisons, the comparison techniques for signed and unsigned values are different).

Handling float and double is trickier still - though if you can safely assume IEEE 754 format, it is primarily a big-endian vs little-endian issue that you face (that and perhaps some skulduggery with a union). The code-base I work with leaves double/float platform dependent (a nuisance, but a decision dating back to the days before IEEE 754 was ubiquitous) so I don't have platform neutral code for that. Also beware of alignments; Intel chips allow misaligned access but other chips (SPARC, PowerPC) do not, or incur large overheads. That means if you copy a 4-byte value, the source and target addresses must be 4-byte aligned if you do a simple copy; the store/load functions above do not have that as a problem and can deal with arbitrary alignments. Again, be wary of over-optimization (premature optimization).

1) Signed vs Unsigned does not make you immune to endian issues. The only data type endian agnostic is a byte (char). With everything else you need to swap endian if you have two different machines

2) A 64-bit machine will always provide you with some type of 32-bit integer which you can use to pull values out of your array. So that shouldn't be an issue, as long as you're sure that both machines are using a 32-bit int (and you probably code the endianness of the data).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM