简体   繁体   English

将字符串转换为字节数组而不进行编码?

[英]Convert String to Byte Array without Encoding?

So I have a string with Binary Data like this: 所以我有一个带有二进制数据的字符串,像这样:

String lob = "ÿØÿà...";

I really have no control over this so I have to take it as it is. 我真的对此无能为力,所以我必须保持现状。 So I need to convert this as an InputStream without changing it, if I just do: 因此,我只需要将其转换为InputStream而不更改它即可:

getBytes()

It will use the default encoding, how do I do this without any encoding or modification? 它将使用默认编码,如何在不进行任何编码或修改的情况下执行此操作?

EdIT: 编辑:

I can't fix this from the source. 我无法从源头上解决此问题。 I do know the original data is a image loaded from HTML GET, but I don't know how it was encoded during transfer . 我确实知道原始数据是从HTML GET加载的图像,但是我不知道在传输过程中它是如何编码的。 All I have right now is a really long string and I need to convert it back and save it into a database. 我现在所拥有的只是一个很长的字符串,我需要将其转换回并保存到数据库中。

There's no such concept as a conversion like that without encoding. 没有像没有编码这样的转换这样的概念。 You're converting between characters and bytes - those aren't the same thing, so a conversion is required, and the form of the conversion is precisely the encoding. 您正在字符字节之间进行转换-这些字符字节是不同的,因此需要进行转换,并且转换的形式正是编码。 Anything which claims to be converting without using an encoding is just assuming some specific encoding without necessarily knowing they're doing so. 任何声称无需使用编码即可进行转换的操作都只是假设某种特定的编码,而不必知道他们正在这样做。

If you want to get the original binary data, you need to find out what encoding was used to convert the bytes into a string to start with. 如果要获取原始二进制数据,则需要找出用于将字节转换为字符串的编码方式。 You may find that ISO-8859-1 will work, but you really need to check. 可能会发现ISO-8859-1可以工作,但是您确实需要检查一下。

At the same time, you should try very hard to change it to use something like base64. 同时,您应该非常努力地更改它以使用诸如base64之类的东西。 Converting arbitrary binary data to text and back like this is a recipe for disaster. 像这样将任意二进制数据转换为文本并返回,这是灾难的根源。

If it really is that your String contains binary data that was just erroneously put into a string instead of a byte-array, then there is quite a simple method of conversion: 如果确实是您的String包含被错误地放入字符串而不是字节数组的二进制数据,那么有一种非常简单的转换方法:

byte[] target = new byte[lob.length()];
for(int i = 0; i < lob.length(); i++)
    target[i] = (byte)lob.charAt(i);

If this data is somehow textual, however, then Jon Skeet's answer is the right one. 如果这些数据是某种形式的文本,那么乔恩·斯基特的答案就是正确的答案。

(This is, by the way, the same as ISO-8859-1 encoding.) (顺便说一下,这与ISO-8859-1编码相同。)

Strings use UTF-16 encoding and to avoid conversion you can use this encoding and each 16-bit character is basic sent as is. 字符串使用UTF-16编码,为避免转换,可以使用此编码,并且每个16位字符都是按原样发送的。

Assuming you are in a Little-Endian environment 假设您处于Little-Endian环境中

out.write(lob.getBytes(StandardCharset.UTF_16LE));

All valid characters will be send without further encoding. 所有有效字符将被发送而无需进一步编码。

Note: binary data should not be stored in Strings unless you really know it safe to do this as not all 16-bit values are valid characters. 注意:二进制数据不应存储在字符串中,除非您真的知道这样做是安全的,因为并非所有16位值都是有效字符。 A better way to store binary data is to use bytes. 存储二进制数据的更好方法是使用字节。

I agree 100% with Jon Skeet. 我100%同意乔恩·斯基特(Jon Skeet)的观点。 I'll add that Java does an implicit conversion of any String data to UTF-16, however that implicit conversion comes on top of an explicit conversion which was done by whoever is passing you that data (wether they know it or not), so using getBytes("UTF-16") isn't automatically going to work either, unless they state otherwise or you've checked so by yourself. 我要补充一点,Java会将任何String数据隐式转换为UTF-16,但是隐式转换是在显式转换之上的,该显式转换由传递给您该数据的人(无论他们是否知道)来完成,因此使用getBytes(“ UTF-16”)不会自动起作用,除非它们另有说明或者您已经自己进行了检查。

Knowing where that data comes from and what encoding it is in is the only way to convert it back properly. 知道数据的来源和编码方式是正确将其转换回的唯一方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM