简体   繁体   English

Java中在ByteBuffer和字符串之间转换的问题

[英]Problems Converting Between ByteBuffer and String in Java

I'm currently developing an application where users can edit a ByteBuffer via a hex editor interface and also edit the corresponding text through a JTextPane. 我目前正在开发一个应用程序,用户可以在其中通过十六进制编辑器界面编辑ByteBuffer,还可以通过JTextPane编辑相应的文本。 My current issue is because the JTextPane requires a String I need to convert the ByteBuffer to a String before displaying the value. 我当前的问题是因为JTextPane需要一个字符串,因此我需要在显示值之前将ByteBuffer转换为字符串。 However, during the conversion invalid characters are replaced by the charsets default replacement character. 但是,在转换过程中,无效字符将由字符集默认替换字符替换。 This squashes the invalid value so when I convert it back to a byte buffer the invalid characters value is replace by the byte value of the default replacement character. 这会压缩无效值,因此当我将其转换回字节缓冲区时,无效字符值将被默认替换字符的字节值替换。 Is there an easy way to retain the byte value of an invalid character in a string? 是否有一种简单的方法来保留字符串中无效字符的字节值? I've read the following stackoverflow posts but usually folks want to just replace unprintable characters, I need to preserve them. 我已经阅读了以下stackoverflow帖子,但通常人们只想替换无法打印的字符,我需要保留它们。

Java ByteBuffer to String Java ByteBuffer转为字符串

Java: Converting String to and from ByteBuffer and associated problems Java:将字符串与ByteBuffer相互转换以及相关问题

Is there an easy way to do this or do I need to keep track of all the changes that happen in the text editor and apply them to the ByteBuffer? 有没有一种简单的方法可以执行此操作,或者我需要跟踪文本编辑器中发生的所有更改并将它们应用于ByteBuffer?

Here is code demonstrating the problem. 这是演示问题的代码。 The code uses byte[] instead of ByteBuffer but the issue is the same. 代码使用byte []而不是ByteBuffer,但是问题是相同的。

        byte[] temp = new byte[16];
        // 0x99 isn't a valid UTF-8 Character
        Arrays.fill(temp,(byte)0x99);

        System.out.println(Arrays.toString(temp));
        // Prints [-103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103]
        // -103 == 0x99

        System.out.println(new String(temp));
        // Prints ����������������
        // � is the default char replacement string

        // This takes the byte[], converts it to a string, converts it back to a byte[]
        System.out.println(Arrays.toString(new String(temp).getBytes()));
        // I need this to print [-103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103]
        // However, it prints
        //[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67]
        // The printed byte is the byte representation of �

What do you think that new String(temp).getBytes() will do for you? 您认为new String(temp).getBytes()将为您做什么?

I can tell you that it does something BAD. 我可以告诉你,它做的不好。

  1. It converts temp to a String using the default encoding, which is probably wrong, and may lose information. 它使用默认编码将temp转换为String ,这可能是错误的,并且可能会丢失信息。
  2. It converts the results back to a byte array, using the default encoding. 它将使用默认编码将结果转换回字节数组。

To turn a byte[] into a String , you must always pass a Charset into the String constructor, or else use a decoder directly. 要将byte[]转换为String ,必须始终将Charset传递给String构造函数,否则直接使用解码器。 Since you are working from buffers, you might find the decoder API congenial. 由于您是使用缓冲区工作的,因此您可能会发现解码器API是合适的。

To turn a String into a byte[] , you must always call getBytes(Charset) so that you know that you're using the correct charset. 要将String转换为byte[] ,必须始终调用getBytes(Charset)以便知道您使用的是正确的字符集。

Based on comments, I am now suspecting that your problem here is that you need to be writing code something like the following to convert from bytes to hex for your UI. 根据评论,我现在怀疑您的问题是,您需要编写类似以下内容的代码以将UI的字节转换为十六进制。 (and then something corresponding to get back.) (然后相应的东西回来。)

String getHexString(byte[] bytes) {
    StringBuilder builder = new StringBuilder();
    for (byte b : bytes) {
       int nibble = b >> 4;
       builder.append('0' + nibble);
       nibble = b & 0xff;
       builder.append('0' + nibble);
    }
    return builder.toString();
}

Especially UTF-8 will go wrong 特别是UTF-8会出错

    byte[] bytes = {'a', (byte) 0xfd, 'b', (byte) 0xe5, 'c'};
    String s = new String(bytes, StandardCharsets.UTF_8);
    System.out.println("s: " + s);

One need a CharsetDecoder. 一个需要一个CharsetDecoder。 There one can ignore (=delete) or replace the offending bytes, or by default: let an exception be thrown. 可以忽略(=删除)或替换有问题的字节,或者默认情况下:抛出异常。

For the JTextPane we use HTML, so we can write the hex code of the offending byte in a <span> giving it a red background. 对于JTextPane,我们使用HTML,因此我们可以在<span>中将有问题的字节的十六进制代码编写为红色背景。

    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
    CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
    CharBuffer charBuffer = CharBuffer.allocate(bytes.length * 50);
    charBuffer.append("<html>");
    for (;;) {
        try {
            CoderResult result = decoder.decode(byteBuffer, charBuffer, false);
            if (!result.isError()) {
                break;
            }
        } catch (RuntimeException ex) {
        }
        int b = 0xFF & byteBuffer.get();
        charBuffer.append(String.format(
            "<span style='background-color:red; font-weight:bold'> %02X </span>",
            b));
        decoder.reset();
    }
    charBuffer.rewind();
    String t = charBuffer.toString();
    System.out.println("t: " + t);

The code does not reflect a very nice API, but play with it. 该代码不能反映一个非常好的API,但是可以使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM