getBytes（“ UTF-8”），getBytes（“ windows-1252”）和getBytes（）有什么區別？

Question

我有以下代碼會產生混亂的輸出。

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;

    public class Main {

        String testString = "Moage test String";

        public static void main(String[] args) {
            new Main();
        }

        public Main(){

            System.out.println("Default charset: "+Charset.defaultCharset());
            System.out.println("Teststring: "+testString);
            System.out.println();
            System.out.println("get the byteStreeam of the test String...");
            System.out.println();
            System.out.println("Bytestream with default encoding: ");
            for(int i = 0; i < testString.getBytes().length; i++){
                System.out.print(testString.getBytes()[i]);
            }
            System.out.println();
            System.out.println();
            System.out.println("Bytestream with encoding UTF-8: ");
            try {
                for(int i = 0; i < testString.getBytes("UTF-8").length; i++){
                    System.out.print(testString.getBytes("UTF-8")[i]);
                }
                System.out.println();
                System.out.println();
                System.out.println("Bytestream with encoding windows-1252 (default): ");

                for(int i = 0; i < testString.getBytes("windows-1252").length; i++){
                    System.out.print(testString.getBytes("windows-1252")[i]);
                }

                System.out.println();
                System.out.println();
                System.out.println("Bytestream with encoding UTF-16: ");

                for(int i = 0; i < testString.getBytes("UTF-16").length; i++){
                    System.out.print(testString.getBytes("UTF-16")[i]);
                }

            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            }
        }
    }

所以我想看看utf-8編碼和Windows-1252之間的區別。 但是當我查看輸出時，似乎沒有什么區別。 只有當我用utf-16 cdompare Windows-1252時才有區別。

輸出：

> Default charset: windows-1252 Teststring: Moage test String
> 
> get the byteStreeam of the test String...
> 
> Bytestream with default encoding: 
> 7711197103101321161011151163283116114105110103
> 
> Bytestream with encoding UTF-8: 
> 7711197103101321161011151163283116114105110103
> 
> Bytestream with encoding windows-1252 (default): 
> 7711197103101321161011151163283116114105110103
> 
> Bytestream with encoding UTF-16: 
> -2-1077011109701030101032011601010115011603208301160114010501100103

誰能解釋我為什么utf-8和Windows-1252看起來一樣 ？

干杯亞歷克斯

Answer 1

這是因為您僅在測試String使用ASCII字符（在這種情況下為"Moage test String" ，請嘗試使用特殊字符（例如"éèà" ，然后您會看到不同的結果。

Answer 2

這里，

您使用的字符串字符屬於ASCII范圍。 如果您的字符串包含任何特殊字符或支持特殊字符的語言，您的字節輸出將被更改。

UTF-8是公認的標准，可在任何地方使用。 但是，Windows-任何編碼都是Windows特定的，並且不能保證在任何計算機上都可以使用。

getBytes（“ UTF-8”），getBytes（“ windows-1252”）和getBytes（）有什么區別？

問題描述

2 個解決方案

解決方案1
3 2016-04-28 08:32:53

解決方案2
0 2016-04-28 08:44:14

getBytes（“ UTF-8”），getBytes（“ windows-1252”）和getBytes（）有什么區別？

問題描述

2 個解決方案

解決方案1 3 2016-04-28 08:32:53

解決方案2 0 2016-04-28 08:44:14

解決方案1
3 2016-04-28 08:32:53

解決方案2
0 2016-04-28 08:44:14