简体   繁体   中英

How to get Java to use the correct character set?

We've got our servers running on CentOS and our Java backend sometimes has to process a file that was originally generated on a Windows machine (by one of our clients) using CP-1252, however in 95%+ use cases, we are processing UTF-8 files.

My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file? If so:

  • Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?
  • What Java objects would we use to apply the correct encoding on a per file basis?

Thanks in advance!

All you need to do is specify what charset/encoding the original file was written in while using the XXXReader(InputStream in, Charset cs) . For eg look at InputStreamReader

My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file?

Assuming you're in charge of the code reading the file, it should be fine. Create a FileInputStream , then wrap it in an InputStreamReader specifying the relevant character encoding.

Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?

That depends on what the JRE supports. I've never used CentOS, so I don't know whether it's likely to come with the relevant encoding as part of the JRE. You can use Charset.isSupported to check though, and Charset.availableCharsets to list what's available.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM