简体   繁体   中英

Char array to byte array in UTF-8 without using String or Charset

I have a little question. I have to encode char array with UTF-8 and get the byte array equivalent of it by using Java. Converting the char array to String and than getting the byte array is not an option, String must be avoided, because of security concerns. If I use

byte[] encoded = Charset.forName("UTF-8").encode(CharBuffer.wrap(toBeEncoded)).array();

When the length of the input array is more than 9 symbols, the output array has an extra element which is empty. If the length is even longer, there are more empty elements. Then I decode it, I get extra extra more elements. If after encoding I have 1 empty element, after decoding there are two. This is not an option too, because I want to encrypt the encoded value. Thank you.

The problem is that Charset.encode() makes no guarantees about the capacity of the buffer it returns. It very well might allocate extra space at the end, which is what you are seeing. However, the buffer's limit will be set correctly. In fact, there is no guarantee that the returned buffer will be backed by an array at all (it could be made a direct buffer in future Java versions, who knows?)

To get a properly sized array you'll need to make a properly sized byte array and copy only the data you want from the byte buffer into that array. Here we use the limit (which is the amount of content actually written into the buffer) to size the new array:

ByteBuffer buf = StandardCharsets.UTF_8.encode(CharBuffer.wrap(toBeEncoded));
byte[] array = new byte[buf.limit()];
buf.get(array);

This article describes the limit, capacity and position of buffers nicely.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM