Java linux character encoding issue

Question

I'm facing an issue with character encoding in linux. I'm retrieving a content from amazon S3, which was saved using UTF-8 encoding. The content is in Chinese and I'm able to see the content correctly in the browser.

I'm using amazon SDK to retrieve the content and do some update to it.Here's the code I'm using:


StringBuilder builder = new StringBuilder();
S3Object object = client.getObject(new GetObjectRequest(bucketName, key));
        BufferedReader reader = new BufferedReader(new 
                InputStreamReader(object.getObjectContent(), "utf-8"));
while (true) {
    String line = reader.readLine();
    if (line == null) 
        break;
    builder.append(line);
}

This piece of code works fine in Windows environment as I was able to update the content and save it back without messing up any chinese characters in it.

But, its acting differently in linux enviroment. The code is unable to translate the characters properly, the chinese characters are rendered as ???

I'm not sure what's going wrong here. Any pointers will be appreciated.

-Thanks

Answer 1

The default charset is different for the 2 OS's your using.

To start off, you can confirm the difference by printing out the default charset.

Charset.defaultCharset.name()

Somewhere in your code, I think this default charset is being used for some String conversion. The correct procedure should be to track that down, and specify UTF-8.

Without seeing that code, I can only suggest the 'cheating' way to do it: set the default charset explicitly, near the beginning of your code, or at Java startup. See here for changing default charset: Setting the default Java character encoding?

HTH

Java linux character encoding issue

Question

1 answers

solution1
3 ACCPTED 2011-05-13 01:03:15

Java linux character encoding issue

Question

1 answers

solution1 3 ACCPTED 2011-05-13 01:03:15

solution1
3 ACCPTED 2011-05-13 01:03:15