简体   繁体   中英

Read and write file with windows-1252

I'm trying to write a file containing some German characters to disk and read it using Windows-1252 encoding. I don't understand why, but my output is like this:

<title>W�hrend und im Anschluss an die Exkursion stehen Ihnen die Ansprechpartner f�r O-T�ne</title>

<p>Die Themen im �berblick</p>

Any thoughts? Here is my code. You'll need spring-core and commons-io to run it.

private static void write(String fileName, Charset charset) throws IOException {
    String html = "<html xmlns=\"http://www.w3.org/1999/xhtml\">" +
                  "<head>" +
                  "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">" +
                  "<title>Während und im Anschluss an die Exkursion stehen Ihnen die Ansprechpartner für O-Töne</title>" +
                  "</head>" +
                  "<body>" +
                  "<p>Die Themen im Überblick</p>" +
                  "</body>" +
                  "</html>";

    byte[] bytes = html.getBytes(charset);
    FileOutputStream outputStream = new FileOutputStream(fileName);
    OutputStreamWriter writer = new OutputStreamWriter(outputStream, charset);
    IOUtils.write(bytes, writer);
    writer.close();
    outputStream.close();
}

private static void read(String file, Charset windowsCharset) throws IOException {
    ClassPathResource pathResource = new ClassPathResource(file);
    String string = IOUtils.toString(pathResource.getInputStream(), windowsCharset);
    System.out.println(string);
}

public static void main(String[] args) throws IOException {
    Charset windowsCharset = Charset.forName("windows-1252");
    String file = "test.txt";
    write(file, windowsCharset);
    read(file, windowsCharset);
}

Your write method is wrong. You are using a writer to write bytes . A writer should be used for writing characters or strings.

You already encoded the string into bytes with the line

byte[] bytes = html.getBytes(charset);

These bytes can simply be written into an output stream:

IOUtils.write(bytes, outputStream);

This makes the writer unnecessary (remove it) and you will now get the correct output.

First ensure that the compiler and editor use the same encoding. This can be checked trying the (ugly) \\uXXXX escaping:

während
w\u00E4hrend

Then

    "<meta http-equiv='Content-Type' content='text/html; charset="
    + charset.name() + "' />" +

    byte[] bytes = html.getBytes(charset);
    Files.write(Paths.get(fileName), bytes);

Ahh, check that the file is in Windows-1252 too. A programmer's editor like NotePad++ or JEdit allows to play with encodings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM