简体   繁体   中英

Java character conversion to UTF-8

I am using:

InputStreamReader isr = new InputStreamReader(fis, "UTF8");

to read in characters from a text file and converting them to UTF8 characters.

My question is, what if one of the characters being read cannot be converted to utf8, what happens? Will there be an exception? or will get the character get dropped off?

You are not converting from one charset to another. You are just indicating that the file is UTF 8 encoded so that you can read it correctly.

If you want to convert from 1 encoding to the other then you should do something like below

File infile = new File("x-utf8.txt");
File outfile = new File("x-utf16.txt");

String fromEncoding="UTF-8";
String toEncoding="UTF-16";

Reader in = new InputStreamReader(new FileInputStream(infile), fromEncoding);
Writer out = new OutputStreamWriter(new FileOutputStream(outfile), toEncoding);

After going through the David Gelhar's response, I feel this code can be improved a bit. If you doesn't know the encoding of the "inFile" then use the GuessEncoding library to detect the encoding and then construct the reader in the encoding detected.

If the input file contains bytes that are not valid utf-8, read() will by default replace the invalid characters with a value of U+FFFD (65533 decimal; the Unicode "replacement character" ).

If you need more control over this behavior, you can use:

InputStreamReader(InputStream in, CharsetDecoder dec)

and supply a CharsetDecoder configured to your liking.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM