简体   繁体   English

如何安全地将Java字节用作unsigned char?

[英]How can I safely use a Java byte as an unsigned char?

I am porting some C code that uses a lot of bit manipulation into Java. 我正在移植一些在Java中使用大量位操作的C代码。 The C code operates under the assumption that int is 32 bits wide and char is 8 bits wide. C代码在假设int为32位宽且char为8位宽的情况下运行。 There are assertions in it that check whether those assumptions are valid. 其中有断言检查这些假设是否有效。

I have already come to terms with the fact that I'll have to use long in place of unsigned int . 我已经接受了这样一个事实:我将不得不使用long代替unsigned int But can I safely use byte as a replacement for unsigned char ? 但我可以安全地使用byte作为unsigned char的替代品吗?

They merely represent bytes, but I have already run into this bizarre incident: ( data is an unsigned char * in C and a byte[] in Java): 它们只代表字节,但我已经遇到了这个奇怪的事件:( data是C中的unsigned char *和Java中的byte[] ):

/* C */
uInt32 c = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3];

/* Java */
long a = ((data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]) & 0xffffffff;
long b = ((data[0] & 0xff) << 24) | ((data[1] & 0xff) << 16) |
          ((data[2] & 0xff) << 8) | (data[3] & 0xff) & 0xffffffff;

You would think a left shift operation is safe. 你会认为左移操作是安全的。 But due strange unary promotion rules in Java, a and b are not going to be the same if some of the bytes in data are "negative" ( b gives the correct result). 但是由于Java中奇怪的一元推广规则,如果data中的某些字节是“负数”( b给出正确的结果),则ab不会相同。

What other "gotchas" should I be aware of? 我应该注意哪些其他“陷阱”? I really don't want to use short here. 我真的不想在这里使用short

You can safely use a byte to represent a value between 0 and 255 if you make sure to bitwise-AND its value with 255 (or 0xFF) before using it in computations. 如果在计算中使用它之前确保将其值与255(或0xFF)进行按位和运算,则可以安全地使用一个byte来表示0到255之间的值。 This promotes it to an int , and ensures the promoted value is between 0 and 255. 这将它提升为int ,并确保提升的值介于0到255之间。

Otherwise, integer promotion would result in an int value between -128 and 127, using sign extension. 否则,整数提升将导致-128和127之间的int值,使用符号扩展名。 -127 as a byte (hex 0x81) would become -127 as an int (hex 0xFFFFFF81). -127作为byte (十六进制0x81)将变为-127作为int (十六进制0xFFFFFF81)。

So you can do this: 所以你可以这样做:

long a = (((data[0] & 255) << 24) | ((data[1] & 255) << 16) | ((data[2] & 255) << 8) | (data[3] & 255)) & 0xffffffff;

Note that the first & 255 is unnecessary here, since a later step masks off the extra bits anyway ( & 0xffffffff ). 注意,这里不需要第一个& 255 ,因为后面的步骤无论如何都会掩盖额外的位( & 0xffffffff )。 But it's probably simplest to just always include it. 但是,总是包含它可能是最简单的。

... can I safely use byte as a replacement for unsigned char ? ...我可以安全地使用byte作为unsigned char的替代吗?

As you've discovered, not really... No. 正如你所发现的,不是真的......不。

According to Oracle Java documentation , byte is a signed integer type, and though it has 256 distinct values (due to the explicit range specification "It has a minimum value of -128 and a maximum value of 127 (inclusive)" from the documentation) there are values that an unsigned char from C can store, that a byte from Java can't (and vice-versa). 根据Oracle Java文档byte是有符号整数类型,虽然它有256个不同的值(由于显式范围规范“它的最小值为-128,最大值为127(包括)”,来自文档)有一些值可以存储来自C的unsigned char ,来自Java的一个byte不能(反之亦然)。

That explains the problem you've experienced. 这解释了您遇到的问题。 However, the extent of the problem hasn't been fully demonstrated on your 8-bit-byte implementation. 但是,您的8位字节实现尚未充分证明问题的严重程度。


What other "gotchas" should I be aware of? 我应该注意哪些其他“陷阱”?

Whilst a byte in Java is required to have support for only values between (and including) -128 and 127, Cs unsigned char has maximum value ( UCHAR_MAX ) that depends upon the number of bits used to represent it ( CHAR_BIT ; at least 8). 虽然Java中的一个byte只需要支持(包括)-128和127之间的值,但Cs unsigned char最大值( UCHAR_MAX )取决于用于表示它的位数( CHAR_BIT ;至少为8) 。 So when CHAR_BIT is greater than 8, there will be extra values beyond 255 that unsigned char can store. 因此,当CHAR_BIT大于8时, unsigned char可以存储超过255的额外值。


In summary, in the world of Java a byte should really be called an octet (a group of eight bits) where-as in C a byte ( char , signed char , unsigned char ) is a group of at least (possibly more than) eight bits . 总之,在Java的世界中,一个byte实际上应该被称为octet (一组八位),其中 - 在C中一个字节charsigned charunsigned char )是一组至少(可能超过)八位

No. They are not equivalent. 不,他们并不等同。 I don't think you'll find an equivalent type in Java, either; 我不认为你会在Java中找到一个等价的类型; they're all rather fixed-width . 它们都是固定宽度的 You could safely use byte in Java as an equivalent for int8_t in C, however (except that int8_t isn't required to exist in C unless CHAR_BIT == 8 ). 你可以安全地使用Java中的byte作为C语言中int8_t的等价物(但是除非CHAR_BIT == 8否则不需要在C中存在int8_t )。


As for pitfalls, there are some in your C code too. 至于陷阱,你的C代码中也有一些。 Assuming data[0] is an unsigned char , data[0] << 24 is undefined behaviour on any system for which INT_MAX == 32767 . 假设data[0]unsigned char ,则data[0] << 24 INT_MAX == 32767任何系统上的未定义行为

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM