简体   繁体   English

无需编码即可将字符串转换为字节数组

[英]Convert String to/from byte array without encoding

I have a byte array read over a network connection that I need to transform into a String without any encoding, that is, simply by treating each byte as the low end of a character and leaving the high end zero. 我有一个通过网络连接读取的字节数组,我需要将其转换为没有任何编码的字符串,也就是说,只需将每个字节视为字符的低端并保留高端零。 I also need to do the converse where I know that the high end of the character will always be zero. 我还需要做反过来,我知道角色的高端总是为零。

Searching the web yields several similar questions that have all got responses indicating that the original data source must be changed. 搜索网络会产生几个类似的问题,这些问题都得到了回复,表明必须更改原始数据源。 This is not an option so please don't suggest it. 这不是一个选项,所以请不要提出建议。

This is trivial in C but Java appears to require me to write a conversion routine of my own that is likely to be very inefficient. 这在C中是微不足道的,但Java似乎要求我编写一个我自己的转换例程,这可能是非常低效的。 Is there an easy way that I have missed? 我错过了一个简单的方法吗?

This will convert a byte array to a String while only filling the upper 8 bits. 这会将字节数组转换为字符串,而只填充高8位。

public static String stringFromBytes(byte byteData[]) {
    char charData[] = new char[byteData.length];
    for(int i = 0; i < charData.length; i++) {
        charData[i] = (char) (((int) byteData[i]) & 0xFF);
    }
    return new String(charData);
}

The efficiency should be quite good. 效率应该非常好。 Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead. 就像Ben Thurley所说,如果性能真的是这样的问题,首先不要转换为String,而是使用字节数组。

No, you aren't missing anything. 不,你没有遗漏任何东西。 There is no easy way to do that because String and char are for text. 没有简单的方法可以做到这一点,因为Stringchar是用于文本的。 You apparently don't want to handle your data as text—which would make complete sense if it isn't text. 您显然不希望将数据作为文本处理 - 如果它不是文本则完全有意义。 You could do it the hard way that you propose. 你可以用你提出的艰难方式去做。

An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). 另一种方法是假设一个字符编码,允许任意字节值的任意序列(0-255)。 ISO-8859-1 or IBM437 both qualify. ISO-8859-1或IBM437都符合资格。 (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way. (Windows-1252只有251个代码点.UTF-8不允许任意序列。)如果你使用ISO-8859-1,结果字符串将与你的硬盘相同。

As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes. 至于效率,处理字节数组的最有效方法是将其保持为字节数组。

使用弃用的构造函数String(byte [] ascii,int hibyte)

String string = new String(byteArray, 0);

Here is a sample code which will convert String to byte array and back to String without encoding. 下面是一个示例代码,它将String转换为byte array并返回String而不进行编码。

public class Test
{

    public static void main(String[] args)
    {
        Test t = new Test();
        t.Test();
    }

    public void Test()
    {
        String input = "Hèllo world";
        byte[] inputBytes = GetBytes(input);
        String output = GetString(inputBytes);
        System.out.println(output);
    }

    public byte[] GetBytes(String str)
    {
        char[] chars = str.toCharArray();
        byte[] bytes = new byte[chars.length * 2];
        for (int i = 0; i < chars.length; i++)
        {
            bytes[i * 2] = (byte) (chars[i] >> 8);
            bytes[i * 2 + 1] = (byte) chars[i];
        }

        return bytes;
    }

    public String GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.length / 2];
        char[] chars2 = new char[bytes.length / 2];
        for (int i = 0; i < chars2.length; i++)
            chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));

        return new String(chars2);

    }
}

String is already encoded as Unicode/UTF-16. 字符串已编码为Unicode / UTF-16。 UTF-16 means that it can take up to 2 string "characters"( char ) to make one displayable character. UTF-16意味着最多可以使用2个字符串“字符”( char )来制作一个可显示的字符。 What you really want is to use is: 你真正想要的是:

byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString); 

to convert a String to an array of bytes. 将String转换为字节数组。 This does exactly what you did above except it is 10 times faster in performance. 这完全符合您的要求,但性能提高了10倍。 If you would like to cut the transmission data nearly in half, I would recommend converting it to UTF8 (ASCII is a subset of UTF8) - the format the internet uses 90% of the time, by calling: 如果您希望将传输数据减少近一半,我建议将其转换为UTF8(ASCII是UTF8的子集) - 互联网使用90%的时间格式,通过调用:

byte[] bytes = Encoding.UTF8.GetBytes(myString);

To convert back to a string use: 要转换回字符串,请使用:

String myString = Encoding.Unicode.GetString(bytes); 

or 要么

String myString = Encoding.UTF8.GetString(bytes);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM