简体   繁体   English

将char表示为Java中的一个字节

[英]Representing char as a byte in Java

I must convert a char into a byte or a byte array. 我必须将char转换为字节或字节数组。 In other languages I know that a char is just a single byte. 在其他语言中,我知道char只是一个字节。 However, looking at the Java Character class, its min value is \ and its max value is \￿. 但是,查看Java Character类,其最小值为\\ u0000,其最大值为\\ uFFFF。 This makes it seem like a char is 2 bytes long. 这使得它看起来像一个2字节长的字符。

Will I be able to store it as a byte or do I need to store it as two bytes? 我能将它存储为一个字节还是需要将其存储为两个字节?

Before anyone asks, I will say that I'm trying to do this because I'm working under an interface that expects my results to be a byte array. 在有人要求之前,我会说我正在尝试这样做,因为我在一个接口上工作,希望我的结果是一个字节数组。 So I have to convert my char to one. 所以我必须将我的char转换成一个。

Please let me know and help me understand this. 请让我知道并帮助我理解这一点。

Thanks, jbu 谢谢,jbu

To convert characters to bytes, you need to specify a character encoding . 要将字符转换为字节,您需要指定字符编码 Some character encodings use one byte per character, while others use two or more bytes. 某些字符编码每个字符使用一个字节,而其他字符编码使用两个或更多字节。 In fact, for many languages, there are far too many characters to encode with a single byte. 实际上,对于许多语言来说,用单个字节编码的字符太多了。

In Java, the simplest way to convert from characters to bytes is with the String class's getBytes(Charset) method. 在Java中,从字符转换为字节的最简单方法是使用String类的getBytes(Charset)方法。 (The StandardCharsets class defines some common encodings.) However, this method will silently replace characters with if the character cannot be mapped under the specified encoding. StandardCharsets类定义了一些常见的编码。)但是,如果无法在指定的编码下映射该字符,则此方法将使用sile静默替换字符。 If you need more control, you can configure a CharsetEncoder to handle this case with an error or use a different replacement character. 如果需要更多控制,可以配置CharsetEncoder以处理此情况并显示错误或使用其他替换字符。

A char is indeed 16 bits in Java (and is also the only unsigned type!!). 在Java中,char确实是16位(并且也是唯一的无符号类型!!)。

If you are sure the encoding of your characters is ASCII, then you can just cast them away on a byte (since ASCII uses only the lower 7 bits of the char). 如果您确定字符的编码是ASCII,那么您可以将它们转换为一个字节(因为ASCII仅使用字符的低7位)。

If you do not need to modify the characters, or understand their signification within a String, you can just store chars on two bytes, like: 如果您不需要修改字符或理解它们在String中的含义,您只需将字符存储在两个字节上,例如:

char[] c = ...;
byte[] b = new byte[c.length*2];
for(int i=0; i<c.length; i++) {
    b[2*i] = (byte) (c[i]&0xFF00)>>8; 
    b[2*i+1] = (byte) (c[i]&0x00FF); 
}

(It may be advisable to replace the 2* by a right shift, if speed matters). (如果速度很重要,可以建议用右移替换2 *)。

Note however that some actual (displayed) characters (or, more accurately, Unicode code-points) are written on two consecutive chars. 但请注意,一些实际(显示)字符(或更准确地说,Unicode代码点)写在两个连续的字符上。 So cutting between two chars does not ensure that you are cutting between actual characters. 因此,在两个字符之间切换并不能确保您在实际字符之间切换。

If you need to decode/encode or otherwise manipulate your char array in a String-aware manner, you should rather try to decode and encode your char array or String using the java.io tools, that ensure proper character manipulation. 如果需要以字符串感知方式解码/编码或以其他方式操作char数组,则应该尝试使用java.io工具对char数组或String进行解码和编码,以确保正确的字符操作。

To extend what others are saying, if you have a char that you need as a byte array, then you first create a String containing that char and then get the byte array from the String: 要扩展其他人所说的内容,如果你有一个字符数组所需的字符,那么首先创建一个包含该字符串的字符串,然后从字符串中获取字节数组:

private byte[] charToBytes(final char x) {
  String temp = new String(new char[] {x});
  try {
    return temp.getBytes("ISO-8859-1");
  } catch (UnsupportedEncodingException e) {
    // Log a complaint
    return null;
  }
}

Of course, use the appropriate character set. 当然,使用适当的字符集。 Much more efficient that this would be to start working with Strings rather than take a char at a time, convert to a String, then convert to a byte array. 更有效的是,开始使用Strings而不是一次获取一个char,转换为String,然后转换为字节数组。

char in java is an unsigned 16 bit value. java中的char是无符号的16位值。 If what you have will fit in 7 bits then just do the cast to a byte (for instance ASCII will fit). 如果你拥有的东西将适合7位,那么只需将转换为一个字节(例如ASCII就适合)。

You could checkout the java.nio.charset APIs as well. 您也可以签出java.nio.charset API。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM