Bad/Strange encoding file from CP1250 to UTF-8 in Java

Question

I have problem with right encoding file from CP1250 to UTF-8. Almost all characters are converted correctly, but characters "ň" and "Ř" not (has "?" char").

At Netbeans I set UTF-8 encoding for project.

Test string in the file can be "skříň SKŘÍŇ". Output at console: "skĹ™ĂĹ? SKĹ?ĂŤĹ‡". Output differs from converting, for example, in PHP. I'm in the end.

My code:

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("file-cp1250.txt"), "CP1250"));
while ((line = br.readLine()) != null) {
  line = new String(line.getBytes("UTF-8"), "CP1250");
  System.out.println(line);
}

Thanks for advices

Answer 1

The following would be principally correct:

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream("file-cp1250.txt"), "CP1250"));
while ((line = br.readLine()) != null) {
    System.out.println(line);
}

That is the binary data of the InputStream is specified as being Windows/Code Page 1250, and is read with decoding. Java String always hold Unicode (so it can combine all scripts).

However System.out is in general the platform dependent console, and that might just not be Cp1250, but something else. The Unicode might be converted to Cp1252, Microsofts Latin-1. And then one is thinking of having some bug. Where System.out simply cannot be used.

Bad/Strange encoding file from CP1250 to UTF-8 in Java

Question

1 answers

solution1
3 ACCPTED 2017-08-27 20:39:45

Bad/Strange encoding file from CP1250 to UTF-8 in Java

Question

1 answers

solution1 3 ACCPTED 2017-08-27 20:39:45

solution1
3 ACCPTED 2017-08-27 20:39:45