[英]Char into byte? (Java)
How come this happens:怎么会出现这种情况:
char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?
Basically, I saw that a char
is 16-bit.基本上,我看到一个
char
是 16 位的。 Therefore, if you cast it into a byte
, how come no data is lost?因此,如果将其转换为
byte
,为什么没有数据丢失? (Value is the same after casting into an int) (转换为 int 后的值相同)
Thanks in advance for answering this little ignorant question of mine.提前感谢您回答我这个无知的小问题。 :P
:P
EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above.编辑:哇,发现我的原始输出实际上按预期执行,但我只是更新了上面的代码。 Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained.
基本上,一个字符被转换为一个字节,然后被转换回一个字符,并保留其原始的 2 字节值。 How does this happen?
这是怎么发生的?
As trojanfoe states, your confusion on the results of your code is partly due to sign-extension.正如 trojanfoe 所说,您对代码结果的混淆部分是由于符号扩展。 I'll try to add a more detailed explanation that may help with your confusion.
我会尝试添加更详细的解释,这可能有助于解决您的困惑。
char a = '\uffff';
byte b = (byte)a; // b = 0xFF
As you noted, this DOES result in the loss of information.正如您所指出的,这确实会导致信息丢失。 This is considered a narrowing conversion .
这被认为是缩小转换。 Converting a char to a byte "simply discards all but the n lowest order bits".
将字符转换为字节“简单地丢弃除 n 个最低位之外的所有位”。
The result is: 0xFFFF -> 0xFF
结果是:
0xFFFF -> 0xFF
char c = (char)b; // c = 0xFFFF
Converting a byte to a char is considered a special conversion .将字节转换为字符被认为是一种特殊的转换。 It actually performs TWO conversions.
它实际上执行两次转换。 First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion).
首先,字节被 SIGN 扩展(新的高位从旧的符号位复制)到一个 int(一个正常的扩展转换)。 Second, the int is converted to a char with a narrowing conversion.
其次,将 int 转换为具有缩小转换的 char。
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF
结果是:
0xFF -> 0xFFFFFFFF -> 0xFFFF
int d = (int)c; // d = 0x0000FFFF
Converting a char to an int is considered a widening conversion .将 char 转换为 int 被视为扩展转换。 When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
当 char 类型扩展为整型时,它是零扩展的(新的高位设置为 0)。
The result is: 0xFFFF -> 0x0000FFFF
.结果是:
0xFFFF -> 0x0000FFFF
。 When printed, this will give you 65535.打印时,这将为您提供 65535。
The three links I provided are the official Java Language Specification details on primitive type conversions.我提供的三个链接是关于原始类型转换的官方 Java 语言规范详细信息。 I HIGHLY recommend you take a look.
我强烈建议你看一看。 They are not terribly verbose (and in this case relatively straightforward).
它们并不是非常冗长(在这种情况下相对简单)。 It details exactly what java will do behind the scenes with type conversions.
它准确地详细说明了 java 在幕后将如何进行类型转换。 This is a common area of misunderstanding for many developers.
这是许多开发人员普遍存在的误解。 Post a comment if you are still confused with any step.
如果您仍然对任何步骤感到困惑,请发表评论。
It's sign extension .它的符号扩展。 Try
\ሴ
instead of \
and see what happens.试试
\ሴ
而不是\
看看会发生什么。
java byte
is signed. java
byte
已签名。 it's counter intuitive.这是反直觉的。 in almost all situations where a byte is used, programmers would want an unsigned byte instead.
在几乎所有使用字节的情况下,程序员都希望使用无符号字节。 it's extremely likely a bug if a byte is cast to int directly.
如果将字节直接转换为 int,则极有可能是一个错误。
This does the intended conversion correctly in almost all programs:这在几乎所有程序中都能正确执行预期的转换:
int c = 0xff & b ;
Empirically, the choice of signed byte is a mistake.从经验上看,有符号字节的选择是错误的。
Some rather strange stuff going on your machine.你的机器上发生了一些相当奇怪的事情。 Take a look at Java language specification, chapter 4.2.1 :
看看Java 语言规范,第 4.2.1 章:
The values of the integral types are integers in the following ranges:
整数类型的值是以下范围内的整数:
For byte, from -128 to 127, inclusive
对于字节,从 -128 到 127,包括
... snip others... ...剪别人...
If your JVM is standards compliant, then your output should be -1
.如果您的 JVM 符合标准,那么您的输出应该是
-1
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.