字符串到二进制字符串-为什么有些字符是多字节的？

Question

This code is supposed to convert a character strings to binary ones, but with a few strings, it returns a String with 16 binary digits, not 8 as I expected them to be. 该代码应该将字符串转换为二进制字符串，但是如果有一些字符串，它将返回一个字符串，该String包含16个二进制数字，而不是我期望的8位数字。

public class aaa {        
    public static void main(String argv[]){
        String nux="ª";
        String nux2="Ø";
        String nux3="(";
        byte []bites = nux.getBytes();
        byte []bites2 = nux2.getBytes();
        byte []bites3 = nux3.getBytes();
               System.out.println(AsciiToBinary(nux));
               System.out.println(AsciiToBinary(nux2));
               System.out.println(AsciiToBinary(nux3));
               System.out.println("number of bytes :"+bites.length);
               System.out.println("number of bytes :"+bites2.length);
               System.out.println("number of bytes :"+bites3.length);


    }

    public static String AsciiToBinary(String asciiString){  

          byte[] bytes = asciiString.getBytes();  
          StringBuilder binary = new StringBuilder();  
          for (byte b : bytes)  
          {  
             int val = b;  
             for (int i = 0; i < 8; i++)  
             {  
                binary.append((val & 128) == 0 ? 0 : 1);  
                val <<= 1;  
             }  
             binary.append(' ');
          }  
          return binary.toString();  
    } 

}

in the first two strings, I don't understand why they return 2 bytes, since they are single-character strings. 在前两个字符串中，我不明白为什么它们返回2个字节，因为它们是单字符字符串。

Compiled here to: https://ideone.com/AbxBZ9 编译到这里： https : //ideone.com/AbxBZ9

This returns: 返回：

11000010 10101010 
11000011 10011000 
00101000 
number of bytes :2
number of bytes :2
number of bytes :1

I am using this code: Convert A String (like testing123) To Binary In Java 我正在使用此代码：在Java中将字符串（如testing123）转换为二进制

NetBeans IDE 8.1 NetBeans IDE 8.1

Answer 1

A character is not always 1-byte long. 字符并不总是1字节长。 Think about it - many languages, such as Chinese or Japanese, have thousands of characters, how would you map those characters to bytes? 考虑一下-许多语言（例如中文或日文）都有成千上万个字符，您如何将这些字符映射到字节？

You are using UTF-8 (one of the many, many ways of mapping characters to bytes) - looking up a character table for UTF-8, and searching for the sequence 11000010 10101010 , I arrive at 您正在使用UTF-8 （将字符映射到字节的多种方法之一）-查找UTF-8的字符表，并搜索序列11000010 10101010 ，我得到了

U+00AA  ª   11000010 10101010

Which is the UTF-8 encoding for ª . ª的UTF-8编码。 UTF-8 is often the default character encoding (charset) for Java -- but you cannot rely on this. UTF-8通常是Java的默认字符编码（字符集）-但您不能依靠它。 That is why you should always specify a charset when converting strings to bytes or vice-versa 这就是为什么在将字符串转换为字节时应始终指定字符集的原因，反之亦然

Answer 2

you can understand why some character are two bytes by running this simple code 您可以通过运行以下简单代码来理解为什么某些字符是两个字节

    // integer - binary 
    System.out.println(Byte.MIN_VALUE);             
    // -128 - 0b11111111111111111111111110000000

    System.out.println(Byte.MAX_VALUE);             
    // 127 - 0b1111111

    System.out.println((int) Character.MIN_VALUE);  
    // 0   - 0b0

    System.out.println((int) Character.MAX_VALUE);  
    // 65535 - 0b1111111111111111

as you can see ,we can show Byte.MAX_VALUE with just 7 bits or 1 byte (01111111) 如您所见，我们可以仅显示7 bits或1 byte (01111111)来显示Byte.MAX_VALUE 1 byte (01111111)

if you cast Character.MIN_VALUE to integer, it will be : 0 如果将Character.MIN_VALUE为整数，则将为： 0
we can show it's binary format with one bit or 1 byte (00000000) ! 我们可以用one bit或1 byte (00000000)来显示它的二进制格式！

but what about Character.MAX_VALUE ? 但是Character.MAX_VALUE呢？

in binary format it's 1111111111111111 which is 65535 in decimal format 二进制格式是1111111111111111 ，十进制格式是65535
and can be shown with 2 bytes (11111111 11111111) . 并且可以显示为2 bytes (11111111 11111111) 。

so characters which their decimal format is between 0 and 65535 can be shown with 1 or 2 bytes . 因此十进制格式在0 and 65535之间的字符可以用1 or 2 bytes 。

hope you understand. 希望你能理解。

字符串到二进制字符串-为什么有些字符是多字节的？

问题描述

2 个解决方案

解决方案1
6 2016-01-30 22:47:18

解决方案2
-1 2016-01-31 23:43:15

字符串到二进制字符串-为什么有些字符是多字节的？

问题描述

2 个解决方案

解决方案1 6 2016-01-30 22:47:18

解决方案2 -1 2016-01-31 23:43:15

解决方案1
6 2016-01-30 22:47:18

解决方案2
-1 2016-01-31 23:43:15