简体   繁体   中英

reading file with accented characters in Java

I came across two special characters which seem not to be covered by the ISO-8859-1 character set ie they don't make it through to my program.

The German ß and the Norwegian ø

i'm reading the files as follows:

FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;

Is there a way for me to read these characters without having to apply manual replacement as a workaround?

[EDIT]

this is how it looks on screen. Note that i have no problems with other accents eg è and the lot...

在此处输入图像描述

Both characters are present in ISO-Latin-1 (check my name to see why I've looked into this).

If the characters are not read in correctly, the most likely cause is that the text in the file is not saved in that encoding, but in something else.

Depending on your operating system and the origin of the file, possible encodings could be UTF-8 or a Windows code page like 850 or 437.

The easiest way is to look at the file with a hex editor and report back what exact values are saved for these two characters.

Assuming that your file is probably UTF-8 encoded, try this:

InputStreamReader ir = new InputStreamReader(inputFile, "UTF-8");

ISO-8859-1 covers ß and ø , so the file is probably saved in a different encoding. You should pass in file's encoding to new InputStreamReader() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM