简体   繁体   中英

File encoding : saved content is different than when read

I have a slight problem trying to save a file in java. For some reason the content I get after saving my file is different from what I have when I read it.

I guess this is related to file encoding, but without being sure.

Here is test code I put together. The idea is basically to read a file, and save it again. When I open both files, they are different.

package workspaceFun;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.commons.codec.DecoderException;

public class FileSaveTest {

    public static void main(String[] args) throws IOException, DecoderException{

        String location = "test.location";
        File locationFile = new File(location);

        FileInputStream fis = new FileInputStream(locationFile);

        InputStreamReader r = new InputStreamReader(fis, Charset.forName("UTF-8"));
        System.out.println(r.getEncoding());


        StringBuilder builder = new StringBuilder();
        int ch;
        while((ch = fis.read()) != -1){
            builder.append((char)ch);
        }

        String fullLocationString = builder.toString();             

        //Now we want to save back
        FileOutputStream fos = new FileOutputStream("C:/Users/me/Desktop/test");
        byte[] b = fullLocationString.getBytes();
        fos.write(b);
        fos.close();
        r.close();
    }
}

An extract from the input file (opened as plain text using Sublime 2):

40b1 8b81 23bc 0014 1a25 96e7 a393 be1e

and from the output file :

40c2 b1c2 8bc2 8123 c2bc 0014 1a25 c296

The getEncoding method returns "UTF8". Trying to save the output file using the same charset doest not seem to solve the issue.

What puzzles me is that when I try to read the input file using Hex from apache.commons.codec like this :

String hexLocationString2 = Hex.encodeHexString(fullLocationString.getBytes("UTF-8"));

The String already looks like my output file, not the input.

Would you have any idea on what can go wrong? Thanks

Extra info for those being interested, I am trying to read an eclipse .location file.

EDIT: I placed the file online so that you can test the code

I believe is the way you are reading the stream.

You are using FileInputStream directly to read the content instead of wrapping it in the InputStreamReader

By using the InputStreamReader you may determine which Charset to use.

Take in consideration that the Charset defined in the InputStream must be the same you expect as InputStream doesn't detect charsets, it just reads them in that specific format.

Try the following changes:

InputStreamReader r = new InputStreamReader(new FileInputStream(locationFile), StandardCharsets.UTF_8);

then instead of fos.read() use r.read()

Finally when writing the String get the bytes in the same Charset as your Reader

FileOutputStream fos = new FileOutputStream("C:/Users/me/Desktop/test");        
fos.write(fullLocationString.getBytes(StandardCharsets.UTF_8));
fos.close()

Try to read and write back as below:

public class FileSaveTest {

    public static void main(String[] args) throws IOException {

        String location = "D:\\test.txt";

        BufferedReader br = new BufferedReader(new FileReader(location));
        StringBuilder sb = new StringBuilder();

        try {
            String line = br.readLine();

            while (line != null) {
                sb.append(line);
                line = br.readLine();

                if (line != null)
                    sb.append(System.lineSeparator());
            }

        } finally {
            br.close();
        }

        FileOutputStream fos = new FileOutputStream("D:\\text_created.txt");
        byte[] b = sb.toString().getBytes();
        fos.write(b);
        fos.close();

    }
}

Test file contains both Cirillic and Latin characters.

SDFASDF
XXFsd1
12312
іва

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM