Java char/int conversion confusion

Question

Given this code in Java:

    FileOutputStream os = new FileOutputStream("/tmp/test.dat");
    os.write(0x14);
    os.write(0xfe);
    os.write(0xae);

    os.write(String.valueOf((char) 0x14).getBytes("UTF-8"));
    os.write(String.valueOf((char) 0xfe).getBytes("UTF-8"));
    os.write(String.valueOf((char) 0xae).getBytes("UTF-8"));

    os.write("\u0014".getBytes("UTF-8"));
    os.write("\u00fe".getBytes("UTF-8"));
    os.write("\u00ae".getBytes("UTF-8"));

    os.close();

Can somebody explain to me why the first 3 bytes in test.dat are

14 fe ae

while the output from the last 6 os.write()'s are

14 c3 be c2

Basically, I want to literally output the bytes 14 fe ae. I was storing these values as a String constant, and writing the value of these constants to a UTF-8 file, but 14 c3 be c2 was output instead. There's obviously a gap in my understanding in how these byte sequences are converted in Java.

Thanks!

Answer 1

It gives:

0x 14 fe ae 14 c3 be c2 ae 14 c3 be c2 ae

The first three bytes are obvious. They're just being outputted literally. For the next three, you should remember that char in Java represents a UTF-16 code unit, not a byte. So you're first creating the Unicode code units U+0014, U+00FE, U+00AE, then converting each to UTF-8. U+0014 is 0x14 in UTF-8 (since it's also ASCII), but U+00FE is 0xC3 0xBE and U+00AE is 0xC2 0xAE.

You're creating the same characters again in the next three lines.

The bottom line is that if you want to store literal bytes, just use a byte array.

Answer 2

"\þ"不是字节0xfe ，而是Unicode代码点0xfe ，当以UTF-8编码时，它可以成为一个多字节值（如上所示）。

Answer 3

You missed a byte: you should be getting 14 c3 be c2 ae .

For your last six os.write calls, internally Java is storing each character in a one-character Unicode string. When you call getBytes this gives you the UTF-8 representation of these characters. For U+00FE ( þ ) this is c3 be , while for U+00AE ( ® ) it's c2 ae .

Java char/int conversion confusion

Question

3 answers

solution1
4 ACCPTED 2010-07-09 21:46:31

solution2
1 2010-07-09 21:51:43

solution3
0 2010-07-09 21:56:53

Java char/int conversion confusion

Question

3 answers

solution1 4 ACCPTED 2010-07-09 21:46:31

solution2 1 2010-07-09 21:51:43

solution3 0 2010-07-09 21:56:53

solution1
4 ACCPTED 2010-07-09 21:46:31

solution2
1 2010-07-09 21:51:43

solution3
0 2010-07-09 21:56:53