简体   繁体   中英

Converting byte array to String (Java)

I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html file in the blobstore.

I'm using fetchData to return a byte[] of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!

Here's my only problem now:

The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.

The smart quotes are coming back as -108 and -109 in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?

The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:

String decoded = new String(bytes, "UTF-8");  // example for one encoding type

By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.


-109 = 0x93: Control Code "Set Transmit State"

The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.

0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:

System.out.println(new String(new byte[]{-109}, "Cp1252")); 

Java 7 and above

You can also pass your desired encoding to the String constructor as a Charset constant from StandardCharsets . This may be safer than passing the encoding as a String , as suggested in the other answers.

For example, for UTF-8 encoding

String bytesAsString = new String(bytes, StandardCharsets.UTF_8);

你可以试试这个。

String s = new String(bytearray);
public static String readFile(String fn)   throws IOException 
{
    File f = new File(fn);

    byte[] buffer = new byte[(int)f.length()];
    FileInputStream is = new FileInputStream(fn);
    is.read(buffer);
    is.close();

    return  new String(buffer, "UTF-8"); // use desired encoding
}
public class Main {

    /**
     * Example method for converting a byte to a String.
     */
    public void convertByteToString() {

        byte b = 65;

        //Using the static toString method of the Byte class
        System.out.println(Byte.toString(b));

        //Using simple concatenation with an empty String
        System.out.println(b + "");

        //Creating a byte array and passing it to the String constructor
        System.out.println(new String(new byte[] {b}));

    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        new Main().convertByteToString();
    }
}

Output

65
65
A

I suggest Arrays.toString(byte_array);

It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3] If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array) does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array) . In this case, s is equal to equivalent of [1, 2, 3] in format of character.

The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.

To work out whether it is Java or your display that is a problem, do this:

    for(int i=0;i<str.length();i++) {
        char ch = str.charAt(i);
        System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
    }

Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM