[英]How can I safely use a Java byte as an unsigned char?
I am porting some C code that uses a lot of bit manipulation into Java. 我正在移植一些在Java中使用大量位操作的C代码。 The C code operates under the assumption that int is 32 bits wide and char is 8 bits wide.
C代码在假设int为32位宽且char为8位宽的情况下运行。 There are assertions in it that check whether those assumptions are valid.
其中有断言检查这些假设是否有效。
I have already come to terms with the fact that I'll have to use long
in place of unsigned int
. 我已经接受了这样一个事实:我将不得不使用
long
代替unsigned int
。 But can I safely use byte
as a replacement for unsigned char
? 但我可以安全地使用
byte
作为unsigned char
的替代品吗?
They merely represent bytes, but I have already run into this bizarre incident: ( data
is an unsigned char *
in C and a byte[]
in Java): 它们只代表字节,但我已经遇到了这个奇怪的事件:(
data
是C中的unsigned char *
和Java中的byte[]
):
/* C */
uInt32 c = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3];
/* Java */
long a = ((data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]) & 0xffffffff;
long b = ((data[0] & 0xff) << 24) | ((data[1] & 0xff) << 16) |
((data[2] & 0xff) << 8) | (data[3] & 0xff) & 0xffffffff;
You would think a left shift operation is safe. 你会认为左移操作是安全的。 But due strange unary promotion rules in Java,
a
and b
are not going to be the same if some of the bytes in data
are "negative" ( b
gives the correct result). 但是由于Java中奇怪的一元推广规则,如果
data
中的某些字节是“负数”( b
给出正确的结果),则a
和b
不会相同。
What other "gotchas" should I be aware of? 我应该注意哪些其他“陷阱”? I really don't want to use
short
here. 我真的不想在这里使用
short
。
You can safely use a byte
to represent a value between 0 and 255 if you make sure to bitwise-AND its value with 255 (or 0xFF) before using it in computations. 如果在计算中使用它之前确保将其值与255(或0xFF)进行按位和运算,则可以安全地使用一个
byte
来表示0到255之间的值。 This promotes it to an int
, and ensures the promoted value is between 0 and 255. 这将它提升为
int
,并确保提升的值介于0到255之间。
Otherwise, integer promotion would result in an int
value between -128 and 127, using sign extension. 否则,整数提升将导致-128和127之间的
int
值,使用符号扩展名。 -127 as a byte
(hex 0x81) would become -127 as an int
(hex 0xFFFFFF81). -127作为
byte
(十六进制0x81)将变为-127作为int
(十六进制0xFFFFFF81)。
So you can do this: 所以你可以这样做:
long a = (((data[0] & 255) << 24) | ((data[1] & 255) << 16) | ((data[2] & 255) << 8) | (data[3] & 255)) & 0xffffffff;
Note that the first & 255
is unnecessary here, since a later step masks off the extra bits anyway ( & 0xffffffff
). 注意,这里不需要第一个
& 255
,因为后面的步骤无论如何都会掩盖额外的位( & 0xffffffff
)。 But it's probably simplest to just always include it. 但是,总是包含它可能是最简单的。
... can I safely use
byte
as a replacement forunsigned char
?...我可以安全地使用
byte
作为unsigned char
的替代吗?
As you've discovered, not really... No. 正如你所发现的,不是真的......不。
According to Oracle Java documentation , byte
is a signed integer type, and though it has 256 distinct values (due to the explicit range specification "It has a minimum value of -128 and a maximum value of 127 (inclusive)" from the documentation) there are values that an unsigned char
from C can store, that a byte
from Java can't (and vice-versa). 根据Oracle Java文档 ,
byte
是有符号整数类型,虽然它有256个不同的值(由于显式范围规范“它的最小值为-128,最大值为127(包括)”,来自文档)有一些值可以存储来自C的unsigned char
,来自Java的一个byte
不能(反之亦然)。
That explains the problem you've experienced. 这解释了您遇到的问题。 However, the extent of the problem hasn't been fully demonstrated on your 8-bit-byte implementation.
但是,您的8位字节实现尚未充分证明问题的严重程度。
What other "gotchas" should I be aware of?
我应该注意哪些其他“陷阱”?
Whilst a byte
in Java is required to have support for only values between (and including) -128 and 127, Cs unsigned char
has maximum value ( UCHAR_MAX
) that depends upon the number of bits used to represent it ( CHAR_BIT
; at least 8). 虽然Java中的一个
byte
只需要支持(包括)-128和127之间的值,但Cs unsigned char
最大值( UCHAR_MAX
)取决于用于表示它的位数( CHAR_BIT
;至少为8) 。 So when CHAR_BIT
is greater than 8, there will be extra values beyond 255 that unsigned char
can store. 因此,当
CHAR_BIT
大于8时, unsigned char
可以存储超过255的额外值。
In summary, in the world of Java a byte
should really be called an octet
(a group of eight bits) where-as in C a byte ( char
, signed char
, unsigned char
) is a group of at least (possibly more than) eight bits . 总之,在Java的世界中,一个
byte
实际上应该被称为octet
(一组八位),其中 - 在C中一个字节 ( char
, signed char
, unsigned char
)是一组至少(可能超过)八位 。
No. They are not equivalent. 不,他们并不等同。 I don't think you'll find an equivalent type in Java, either;
我不认为你会在Java中找到一个等价的类型; they're all rather fixed-width .
它们都是固定宽度的 。 You could safely use
byte
in Java as an equivalent for int8_t
in C, however (except that int8_t
isn't required to exist in C unless CHAR_BIT == 8
). 你可以安全地使用Java中的
byte
作为C语言中int8_t
的等价物(但是除非CHAR_BIT == 8
否则不需要在C中存在int8_t
)。
As for pitfalls, there are some in your C code too. 至于陷阱,你的C代码中也有一些。 Assuming
data[0]
is an unsigned char
, data[0] << 24
is undefined behaviour on any system for which INT_MAX == 32767
. 假设
data[0]
是unsigned char
,则data[0] << 24
是 INT_MAX == 32767
任何系统上的未定义行为 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.