简体   繁体   English

使用java的Unicode base 64编码

[英]Unicode base 64 encoding with java

I am trying to encode and decode a UTF8 string to base64. 我正在尝试编码和解码UTF8字符串到base64。 In theory not a problem but when decoding and never seem to output the correct characters but the ?. 理论上不是问题,但解码时似乎永远不会输出正确的字符而是?


        String original = "خهعسيبنتا";
        B64encoder benco = new B64encoder();
        String enc = benco.encode(original);
        try
        {
            String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
            PrintStream out = new PrintStream(System.out, true, "UTF-8");
            out.println("Original: " + original);
            prtHx("ara", original.getBytes());
            out.println("Encoded: " + enc);
            prtHx("enc", enc.getBytes());
            out.println("Decoded: " + dec);
            prtHx("dec", dec.getBytes());
        } catch (UnsupportedEncodingException e)
        {
            e.printStackTrace();
        }

The output to the console is as follow: 控制台的输出如下:

Original: خهعسيبنتا 原文:خهعسيبنتا
ara = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F ara = 3F,3F,3F,3F,3F,3F,3F,3F,3F
Encoded: Pz8/Pz8/Pz8/ 编码:Pz8 / Pz8 / Pz8 /
enc = 50, 7A, 38, 2F, 50, 7A, 38, 2F, 50, 7A, 38, 2F enc = 50,7A,38,2F,50,7A,38,2F,50,7A,38,2F
Decoded: ????????? 解码:?????????
dec = 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F, 3F dec = 3F,3F,3F,3F,3F,3F,3F,3F,3F

prtHx simply writes the hex value of the bytes to the output. prtHx只是将字节的十六进制值写入输出。 Am I doing something obviously wrong here? 我在做一些明显不对的事吗?


Andreas pointed to the correct solution by highlighting that the getBytes() method uses the platform default encoding (Cp1252) even though the source file itself is UTF-8. Andreas通过强调getBytes()方法使用平台默认编码(Cp1252)来指出正确的解决方案,即使源文件本身是UTF-8。 By using the getBytes("UTF-8") I was able to notice that the bytes encoded and decoded were actually different. 通过使用getBytes(“UTF-8”),我能够注意到编码和解码的字节实际上是不同的。 further investigation shown that the encode method used getBytes(). 进一步调查表明,编码方法使用了getBytes()。 Changing this did the trick nicely. 改变这个很好地解决了这个问题。


try
        {
            String enc = benco.encode(original);
            String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
            PrintStream out = new PrintStream(System.out, true, "UTF-8");
            out.println("Original: " + original);
            prtHx("ori", original.getBytes("UTF-8"));
            out.println("Encoded: " + enc);
            prtHx("enc", enc.getBytes("UTF-8"));
            out.println("Decoded: " + dec);
            prtHx("dec", dec.getBytes("UTF-8"));

        } catch (UnsupportedEncodingException e)
        {
            e.printStackTrace();
        }

System encoding Cp1252 系统编码Cp1252
Original: خهعسيبنتا 原文:خهعسيبنتا
ori = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7 ori = D8,AE,D9,87,D8,B9,D8,B3,D9,8A,D8,A8,D9,86,D8,AA,D8,A7
Encoded: 2K7Zh9i52LPZitio2YbYqtin 编码:2K7Zh9i52LPZitio2YbYqtin
enc = 32, 4B, 37, 5A, 68, 39, 69, 35, 32, 4C, 50, 5A, 69, 74, 69, 6F, 32, 59, 62, 59, 71, 74, 69, 6E enc = 32,4B,37,5A,68,39,69,35,32,4C,50,5A,69,74,69,6F,32,59,62,59,71,74,69,6E
Decoded: خهعسيبنتا 解码:خهعسيبنتا
dec = D8, AE, D9, 87, D8, B9, D8, B3, D9, 8A, D8, A8, D9, 86, D8, AA, D8, A7 dec = D8,AE,D9,87,D8,B9,D8,B3,D9,8A,D8,A8,D9,86,D8,AA,D8,A7

Thanks. 谢谢。

String#getBytes() encodes the characters using the platform's default charset. String#getBytes()使用平台的默认字符集对字符进行编码。 The actual encoding of the String literal "خهعسيبنتا" is "defined" in the java source file (you choose a character encoding when you create or save the file) 字符串文字"خهعسيبنتا"的实际编码在java源文件中"خهعسيبنتا"已定义”(您在创建或保存文件时选择字符编码)

This could be the reason, why ara is encode to 0x3f bytes.. 这可能是为什么ara编码为0x3f字节的原因。

Give this a try: 尝试一下:

out.println("Original: " + original);
prtHx("ara", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM