简体   繁体   English

将字节数组转换为字符串(Java)

[英]Converting byte array to String (Java)

I'm writing a web application in Google app Engine.我正在 Google App Engine 中编写一个网络应用程序。 It allows people to basically edit html code that gets stored as an .html file in the blobstore.它允许人们基本上编辑作为.html文件存储在 blobstore 中的 html 代码。

I'm using fetchData to return a byte[] of all the characters in the file.我正在使用 fetchData 返回文件中所有字符的byte[] I'm trying to print to an html in order for the user to edit the html code.我正在尝试打印到 html,以便用户编辑 html 代码。 Everything works great!一切都很好!

Here's my only problem now:这是我现在唯一的问题:

The byte array is having some issues when converting back to a string.字节数组在转换回字符串时存在一些问题。 Smart quotes and a couple of characters are coming out looking funky.聪明的引号和几个字符看起来很时髦。 (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem. (?或日语符号等)具体来说,我看到有几个字节具有导致问题的负值。

The smart quotes are coming back as -108 and -109 in the byte array.智能引号在字节数组中以-108-109返回。 Why is this and how can I decode the negative bytes to show the correct character encoding?这是为什么,我如何解码负字节以显示正确的字符编码?

The byte array contains characters in a special encoding (that you should know).字节数组包含特殊编码的字符(您应该知道)。 The way to convert it to a String is:将其转换为字符串的方法是:

String decoded = new String(bytes, "UTF-8");  // example for one encoding type

By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.顺便说一句 - 原始字节出现可能显示为负小数,因为 java 数据类型byte是有符号的,它涵盖从 -128 到 127 的范围。


-109 = 0x93: Control Code "Set Transmit State"

The value (-109) is a non-printable control character in UNICODE.值 (-109) 是 UNICODE 中不可打印的控制字符。 So UTF-8 is not the correct encoding for that character stream.因此 UTF-8 不是该字符流的正确编码。

0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". “Windows-1252”中的0x93是您要查找的“智能引用”,因此该编码的 Java 名称是“Cp1252”。 The next line provides a test code:下一行提供了一个测试代码:

System.out.println(new String(new byte[]{-109}, "Cp1252")); 

Java 7 and above Java 7 及以上

You can also pass your desired encoding to the String constructor as a Charset constant from StandardCharsets .您还可以将所需的编码作为来自StandardCharsetsCharset常量传递给String构造函数。 This may be safer than passing the encoding as a String , as suggested in the other answers.正如其他答案中所建议的那样,这可能比将编码作为String传递更安全。

For example, for UTF-8 encoding例如,对于 UTF-8 编码

String bytesAsString = new String(bytes, StandardCharsets.UTF_8);

你可以试试这个。

String s = new String(bytearray);
public static String readFile(String fn)   throws IOException 
{
    File f = new File(fn);

    byte[] buffer = new byte[(int)f.length()];
    FileInputStream is = new FileInputStream(fn);
    is.read(buffer);
    is.close();

    return  new String(buffer, "UTF-8"); // use desired encoding
}
public class Main {

    /**
     * Example method for converting a byte to a String.
     */
    public void convertByteToString() {

        byte b = 65;

        //Using the static toString method of the Byte class
        System.out.println(Byte.toString(b));

        //Using simple concatenation with an empty String
        System.out.println(b + "");

        //Creating a byte array and passing it to the String constructor
        System.out.println(new String(new byte[] {b}));

    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        new Main().convertByteToString();
    }
}

Output输出

65
65
A

I suggest Arrays.toString(byte_array);我建议Arrays.toString(byte_array);

It depends on your purpose.这取决于你的目的。 For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3] If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array) does this,.例如,我想保存一个与调试时看到的格式完全相同的字节数组,如下所示: [1, 2, 3]如果您想保存完全相同的值而不将字节转换为字符格式, Arrays.toString (byte_array)这样做的。 But if you want to save characters instead of bytes, you should use String s = new String(byte_array) .但是如果你想保存字符而不是字节,你应该使用String s = new String(byte_array) In this case, s is equal to equivalent of [1, 2, 3] in format of character.在这种情况下, s等于字符格式中的[1, 2, 3]

The previous answer from Andreas_D is good. Andreas_D 之前的回答很好。 I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.我只想补充一点,无论您在哪里显示输出,都会有字体和字符编码,并且它可能不支持某些字符。

To work out whether it is Java or your display that is a problem, do this:要确定是 Java 还是您的显示器有问题,请执行以下操作:

    for(int i=0;i<str.length();i++) {
        char ch = str.charAt(i);
        System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
    }

Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. Java 会将它无法理解的任何字符映射到 0xfffd 未知字符的官方字符。 If you see a '?'如果您看到“?” in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.在输出中,但它没有映射到 0xfffd,这是您的显示字体或编码的问题,而不是 Java。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM