简体   繁体   English

字符转换成字节? (爪哇)

[英]Char into byte? (Java)

How come this happens:怎么会出现这种情况:

char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?

Basically, I saw that a char is 16-bit.基本上,我看到一个char是 16 位的。 Therefore, if you cast it into a byte , how come no data is lost?因此,如果将其转换为byte ,为什么没有数据丢失? (Value is the same after casting into an int) (转换为 int 后的值相同)

Thanks in advance for answering this little ignorant question of mine.提前感谢您回答我这个无知的小问题。 :P :P

EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above.编辑:哇,发现我的原始输出实际上按预期执行,但我只是更新了上面的代码。 Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained.基本上,一个字符被转换为一个字节,然后被转换回一个字符,并保留其原始的 2 字节值。 How does this happen?这是怎么发生的?

As trojanfoe states, your confusion on the results of your code is partly due to sign-extension.正如 trojanfoe 所说,您对代码结果的混淆部分是由于符号扩展。 I'll try to add a more detailed explanation that may help with your confusion.我会尝试添加更详细的解释,这可能有助于解决您的困惑。

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

As you noted, this DOES result in the loss of information.正如您所指出的,这确实会导致信息丢失。 This is considered a narrowing conversion .这被认为是缩小转换 Converting a char to a byte "simply discards all but the n lowest order bits".将字符转换为字节“简单地丢弃除 n 个最低位之外的所有位”。
The result is: 0xFFFF -> 0xFF结果是: 0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

Converting a byte to a char is considered a special conversion .将字节转换为字符被认为是一种特殊的转换 It actually performs TWO conversions.它实际上执行两次转换。 First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion).首先,字节被 SIGN 扩展(新的高位从旧的符号位复制)到一个 int(一个正常的扩展转换)。 Second, the int is converted to a char with a narrowing conversion.其次,将 int 转换为具有缩小转换的 char。
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF结果是: 0xFF -> 0xFFFFFFFF -> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

Converting a char to an int is considered a widening conversion .将 char 转换为 int 被视为扩展转换 When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).当 char 类型扩展为整型时,它是零扩展的(新的高位设置为 0)。
The result is: 0xFFFF -> 0x0000FFFF .结果是: 0xFFFF -> 0x0000FFFF When printed, this will give you 65535.打印时,这将为您提供 65535。

The three links I provided are the official Java Language Specification details on primitive type conversions.我提供的三个链接是关于原始类型转换的官方 Java 语言规范详细信息。 I HIGHLY recommend you take a look.我强烈建议你看一看。 They are not terribly verbose (and in this case relatively straightforward).它们并不是非常冗长(在这种情况下相对简单)。 It details exactly what java will do behind the scenes with type conversions.它准确地详细说明了 java 在幕后将如何进行类型转换。 This is a common area of misunderstanding for many developers.这是许多开发人员普遍存在的误解。 Post a comment if you are still confused with any step.如果您仍然对任何步骤感到困惑,请发表评论。

It's sign extension .它的符号扩展 Try \ሴ instead of \￿ and see what happens.试试\ሴ而不是\￿看看会发生什么。

java byte is signed. java byte已签名。 it's counter intuitive.这是反直觉的。 in almost all situations where a byte is used, programmers would want an unsigned byte instead.在几乎所有使用字节的情况下,程序员都希望使用无符号字节。 it's extremely likely a bug if a byte is cast to int directly.如果将字节直接转换为 int,则极有可能是一个错误。

This does the intended conversion correctly in almost all programs:这在几乎所有程序中都能正确执行预期的转换:

int c = 0xff & b ;

Empirically, the choice of signed byte is a mistake.从经验上看,有符号字节的选择是错误的。

Some rather strange stuff going on your machine.你的机器上发生了一些相当奇怪的事情。 Take a look at Java language specification, chapter 4.2.1 :看看Java 语言规范,第 4.2.1 章

The values of the integral types are integers in the following ranges:整数类型的值是以下范围内的整数:

For byte, from -128 to 127, inclusive对于字节,从 -128 到 127,包括

... snip others... ...剪别人...

If your JVM is standards compliant, then your output should be -1 .如果您的 JVM 符合标准,那么您的输出应该是-1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM