简体   繁体   English

在不使用String或Charset的情况下,将数组字符串转换为UTF-8中的字节数组

[英]Char array to byte array in UTF-8 without using String or Charset

I have a little question. 我有一个小问题。 I have to encode char array with UTF-8 and get the byte array equivalent of it by using Java. 我必须使用UTF-8对char数组进行编码,并使用Java获取与其相当的字节数组。 Converting the char array to String and than getting the byte array is not an option, String must be avoided, because of security concerns. 将char数组转换为String而不是获取字节数组不是一个选项,因为安全问题,必须避免使用String。 If I use 如果我使用

byte[] encoded = Charset.forName("UTF-8").encode(CharBuffer.wrap(toBeEncoded)).array();

When the length of the input array is more than 9 symbols, the output array has an extra element which is empty. 当输入数组的长度超过9个符号时,输出数组有一个空的额外元素。 If the length is even longer, there are more empty elements. 如果长度更长,则有更多空元素。 Then I decode it, I get extra extra more elements. 然后我解码它,我得到额外的额外元素。 If after encoding I have 1 empty element, after decoding there are two. 如果编码后我有1个空元素,解码后有两个。 This is not an option too, because I want to encrypt the encoded value. 这也不是一个选项,因为我想加密编码值。 Thank you. 谢谢。

The problem is that Charset.encode() makes no guarantees about the capacity of the buffer it returns. 问题是Charset.encode()不保证它返回的缓冲区的容量 It very well might allocate extra space at the end, which is what you are seeing. 它很可能会在最后分配额外的空间,这就是你所看到的。 However, the buffer's limit will be set correctly. 但是,将正确设置缓冲区的限制 In fact, there is no guarantee that the returned buffer will be backed by an array at all (it could be made a direct buffer in future Java versions, who knows?) 实际上,无法保证返回的缓冲区完全由数组支持(它可以在未来的Java版本中成为直接缓冲区,谁知道?)

To get a properly sized array you'll need to make a properly sized byte array and copy only the data you want from the byte buffer into that array. 要获得正确大小的数组,您需要制作一个大小合适的字节数组,并将所需的数据从字节缓冲区复制到该数组中。 Here we use the limit (which is the amount of content actually written into the buffer) to size the new array: 这里我们使用限制(实际写入缓冲区的内容量)来调整新数组的大小:

ByteBuffer buf = StandardCharsets.UTF_8.encode(CharBuffer.wrap(toBeEncoded));
byte[] array = new byte[buf.limit()];
buf.get(array);

This article describes the limit, capacity and position of buffers nicely. 本文很好地描述了缓冲区的限制,容量和位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用UTF-8将单字节从字节数组转换为字符串 - Convert Single Byte from byte array to string using UTF-8 Java将字节数组转换为字符串UTF-8 - Java converting byte array to string UTF-8 字节[]到字符串的转换,并再次使用UTF-8编码返回到字节[],没有给出相同的字节数组 - byte [] to String conversion and again back to byte [] using UTF-8 encoding is not giving same byte array 使用 UTF-8 字符集时,为什么在链中 byte[] → String → byte[] 输入和输出不同? - Why in chain byte[] → String → byte[] input and output differ when using UTF-8 charset? 如何在 java 中使用 utf-8 字符串而不分配新的字符串 object 而是作为字节数组的一部分? - How to work with utf-8 strings in java without allocating a new String object but as part of byte array instead? 将String转换为UTF-8字节数组会在Java中返回负值 - Converting String to UTF-8 byte array returns a negative value in Java 如何检查字节数组是否有效的UTF-8字符串 - How to check if byte array is valid UTF-8 String 实现一个函数来检查字符串/字节数组是否遵循utf-8格式 - Implement a function to check if a string/byte array follows utf-8 format 使用移位操作将代码点转换为Java中的utf-8字节数组 - Convert codepoint to utf-8 byte array in Java using shifting operations UTF-8解码/字节数组到文件 - UTF-8 Decode / Byte Array to File
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM