Java japanese characters string size in bytes

Question

I'm trying to calculate the length of the String of japanese characters '漢字仮名交じり文' :

    String testStr = "漢字仮名交じり文";
    try {
        System.out.println("Length : " + testStr.getBytes("UTF-16").length);
    }
        catch(Exception ex) {
        ..... 
    }

There are 8 characters in the string and this excerpt prints : 18. Why is it 18?

Answer 1

It is 18 since your have 8 characters each encoded as UTF-16 which means 2 bytes each. Consequently this is 8*2=16 plus the 2 byte BOM which got inserted at the beginning of the byte array!

This is your byte sequence (feff is the so called BOM or Byte Order Mark which allows to detect if the byte sequence is using little endiion or big endian byte order):

fe ff 6f 22 5b 57 4e ee 54 0d 4e a4 30 58 30 8a 65 87

This is how I printed the byte sequence (it is crude code only meant for testing this out of course):

final String text = "漢字仮名交じり文";
byte[] bytes = text.getBytes("UTF-16");
for (int i=0; i<bytes.length; ++i) {
    System.out.printf("%02x ", bytes[i]);
}

Answer 2

You are getting the byte count, which is not the character count. depending on the encoding (you used UTF-16), a character can be from 1 to 4 bytes.

Answer 3

If you actually want to find the character count in a given string, an easy way to do (not optimal) it is

   String testStr = "漢字仮名交じり文";
   System.out.println(testStr.toCharArray().length);

Prints 8

Java japanese characters string size in bytes

Question

3 answers

solution1
7 2013-06-23 18:48:15

solution2
4 2013-06-23 17:09:21

solution3
1 2013-06-23 17:29:44

Java japanese characters string size in bytes

Question

3 answers

solution1 7 2013-06-23 18:48:15

solution2 4 2013-06-23 17:09:21

solution3 1 2013-06-23 17:29:44

solution1
7 2013-06-23 18:48:15

solution2
4 2013-06-23 17:09:21

solution3
1 2013-06-23 17:29:44