简体   繁体   中英

Reading UTF-16 file in java gives no data

I am creating a csv file with data from the DB and encoding it to UTF-16LE to get the special characters like e`.But while I'm trying to read the same file in Java, as in:

BufferedReader br = new BufferedReader(new InputStreamReader(
fileContent, "utf16"));

I am getting no data.

If I use UTF-8 encode while reading inputstream as in:

BufferedReader br = new BufferedReader(new InputStreamReader(
fileContent, "utf8"));

using Buffered reader I'm getting all data but the special characters are coming as :

Brut¿l¿

where it should be Brutélé.

How do I get data in java with UTF-16? I have already tried with UTF-16LE and ANSI in my Java code. ANSI is giving unhandled exception, and 16LE is making no difference.

Below is the code to export the file:

`

    OutputStream outStream = null;
    InputStream inputStream = null;
    final int BUFFER_SIZE =33554432;

    try {

        inputStream = new ByteArrayInputStream(input.getBytes("UTF-16LE"));

        System.out.println("outStream = " + outStream);

        byte[] buffer = new byte[BUFFER_SIZE];
        int bytesRead = -1;
        if (inputStream != null)
            try {
                while ((bytesRead = inputStream.read(buffer)) != -1) {
                    outStream.write(buffer, 0, bytesRead);

                    if (outStream != null)

                        outStream.close();
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }`

As already said by @John Skeet . The byte sequence 42 72 75 74 E9 6C E9 is not UTF, it's ISO_8859_1.

You can verify it with following snippet

byte[] b = {0x42, 0x72, 0x75, 0x74, (byte) 0xE9, 0x6C, (byte) 0xe9};
System.out.println("ISO_8859_1: " 
        + new String(b, StandardCharsets.ISO_8859_1));
System.out.println("UTF_8     : " 
        + new String(b, StandardCharsets.UTF_8));
System.out.println("UTF_16LE  : " 
        + new String(b, StandardCharsets.UTF_16LE));

output (on a Unicode aware console)

ISO_8859_1: Brutélé
UTF_8     : Brut�l�
UTF_16LE  : 牂瑵泩�

You may use the unproper type of encoding. Here are correct charset types Charset

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM