简体   繁体   English

Java Charset支持所有符号,每个符号使用8位,每个符号的范围为[0-255]

[英]Java Charset that supports all symbols using 8 bit per symbol from ranges [0-255] per character

I'm trying to pass a byte array with any kind of data ranging from 0 to 255 per element. 我正在尝试传递一个字节数组,其中包含每个元素0 to 255之间的任何数据。

I have to pass it into Javascript so I convert it into a String, but some characters get lost and replaced with 0x3F Question Mark . 我必须将它传递给Javascript,所以我将其转换为字符串,但有些字符丢失并替换为0x3F Question Mark

Whats the proper Charset that supports all 8 bit symbols to transfer to Javascript. 什么是正确的Charset支持所有8位符号转移到Javascript。

public String base64Decode(String s) {
  //... lots of stuff transforming String into byte array.

  //Some example bytes shown here.
  byte[] destArray = {(byte)0xf3, (byte)0xc3, 00, 01, 00, 00, 00, 00, (byte)0xc3, (byte)0x63, (byte)0x2d, 00, 00, 00, 00, 00, (byte)0xe0, (byte)0x9d, (byte)0xea};
  System.out.println(new String(destArray, Charset.forName("UTF-8")));
  return new String(new String(destArray, Charset.forName("UTF-8")));
}

I output the System.out.println into a file using a batch script 我使用批处理脚本将System.out.println输出到一个文件中

java Test > out.bin

Then compare byte by byte to see what is lost. 然后逐字节比较以查看丢失的内容。
To sum it up 0x9D becomes 0x3D which is wrong. 总结一下, 0x9D变为0x3D ,这是错误的。
There are probably others too but I didn't check the whole file its over 2 megs in size. 可能还有其他人,但我没有检查整个文件的大小超过2兆。

The default new String(destArray); 默认的new String(destArray); does a better job but still misses a few characters. 做得更好但仍然错过了几个角色。

You can use ISO-8859-1 . 您可以使用ISO-8859-1

However, it's an ugly hack that should only be used if something really prevents you from using correct datatypes (ie using byte[] for binary data). 但是,这是一个丑陋的黑客,只有当某些东西真的阻止你使用正确的数据类型(即使用byte[]作为二进制数据)时才应该使用它。

From the common sense, base64 is a way to represent binary data as ASCII strings, therefore base64Decode() should take a String and return a byte[] . 从常识来看,base64是一种将二进制数据表示为ASCII字符串的方法,因此base64Decode()应该接受一个String并返回一个byte[]

You cannot just blindly use any charset you want. 你不能盲目地使用任何你想要的字符集。 Strings in Java and Javascript use UTF-16. Java和Javascript中的字符串使用UTF-16。 Once you have decoded the base64 data into a byte array, you have to know the exact charset those bytes actually represent so they can be converted to UTF-16 correctly without losing any data. 将base64数据解码为字节数组后,您必须知道这些字节实际表示的确切字符集,以便它们可以正确转换为UTF-16而不会丢失任何数据。 You have to know the charset that was used when the data was base64 encoded. 您必须知道数据是base64编码时使用的字符集。 If you do not know the exact charset, you are left with heuristic analysis or just plain guessing, and both are not reliable enough. 如果你不知道确切的字符集,你会留下启发式分析或只是简单的猜测,而且两者都不够可靠。 Either both parties must agree on a common charset ahead of time, or else the charset needs to be exchanged along with the base64 data. 双方必须提前就共同的字符集达成一致,否则字符集需要与base64数据一起交换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM