简体   繁体   中英

Java Character Encoding Writing to Text File

My Issue is as follows:

Having issue with character encoding when writing to text file. The issue is characters are not showing the intended value. for example I am writing ' '(which is probably a Tab character) and 'Â' is what is displayed in the text file.

Background information

This data is being stored on a MSQL Database. The Database Collation is SQL_Latin1_General_CP1_CI_AS and the fields are varchar . I've come to learn the collation and type determine what character encoding is used on the database side. Values are stored correctly so no issues here.

My Java application runs queries to pull the data from the DB and this too also looks OK. I have debugged the code and seen all the Strings have the correct representation before writing to the file.

Next I write the text to the .TXT file using a OutputStreamWriter as follows:

public OfferFileBuilder(String clientAppName, boolean isAppend) throws IOException, URISyntaxException {
        String exportFileLocation = getExportedFileLocation();
        File offerFile = new File(getDatedFileName(exportFileLocation+"/"+clientAppName+"_OFFERRECORDS"));
        bufferedWriter  = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(offerFile, isAppend), "UTF-8"));
    }

Now once I open up the file on the Linux server by running cat command on file or open up the file using notepad++ some of the characters are incorrectly displaying.

I've ran the following commands on the server to see its encoding locale charmap which prints UTF-8 , echo $LANG which prints en_US.UTF-8 , and echo $LC_CTYPE` prints nothing.

Here is what I've attempted so far. I've attempted to change the Character encoding used by the OutputStreamWriter I've tried UTF-8, and CP1252. When switching encoding some characters are fixed when others are then improperly displayed.

My Question is this: Which encoding should my OutputStreamWriter be using? (Bonus Questions) how are we supposed to avoid issues like this from happening. The rule of thumb i was provided was use UTF-8 and you will never run into problems, but this isn't the case for me right now.

running file -bi command on the server revealed that the file was encoded with ascii instead of utf8 . Removing the file completely and rerunning the process fixed this for me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM