Japanese character encoding using Shift_JIS in Java

Question

I have a web application which is served using tomcat.

On one of the pages, it allows the users to download a file stored on my file server. The names of most of the files present there are in Japanese. However, when the user downloads the file, the name of the file is garbled. Also, it works differently on different browsers.

The original code is as below:

FileInputStream in = new FileInputStream(absolutePath);
ResponseUtil.download(new String(downloadFileName.getBytes("Shift_JIS"), "ISO-8859-1"), in);

eg, 08_タイヨーアクリス_装置開発_実績表 gets interpreted as
08_ƒ^ƒCƒˆ-[ƒAƒNƒŠƒX_'•'uŠJ”-_ŽÀ-Ñ• in Google Chrome
This problem is due to the presence of '5c' in the file name and seems to be a known problem in Shift_JIS. I want to know the correct way to work around this problem.

Answer 1

It looks like the ResponseUtil.download method from the "Seasar sastruts" framework you're using is taking the filename you provide and sticking it directly in the Content-disposition header of the HTTP response it constructs.

response.setHeader("Content-disposition", "attachment; filename=" + fileName + "\"");

As far as I can tell, HTTP and MIME headers only support ASCII characters, so this technique won't work with non-ASCII characters. (If this is the case, I'd consider it a bug in this class that it unconditionally sticks the filename in to the header.) Modifying or trying to re-encode the string before you pass it in won't work, because this encoding is at a different level.

To support non-ASCII characters, the header value needs to be encoded using the MIME encoded-word technique . There's no way to do this with that ResponseUtil class as it is, because it concatenates the name you provide directly in to a non-encoded-word string.

I think you'll need to rewrite that download() method to check for non-ASCII characters in the filename inputs it receives, and use encoded-word encoding on strings that contain them. You'd want it to look something like this, where some_base64_text is the actual base-64 encoding of the bytes of your file name encoded as Shift-JIS. (Or use UTF-8 instead.)

Content-disposition: =?Shift_JIS?B?some_base64_text?=

There's probably a lot of different browser behaviors around this, because they're trying to work around various web servers that are doing it "wrong". But it looks like encoding it this way is a good bet for getting it working and making it portable.

Answer 2

Thanks a lot. I was able to solve the problem on Chrome using the following:

ResponseUtil.download(URLEncoder.encode(downloadFileName, "UTF-8"), in);

However, the encoding is still not proper in Firefox and Safari.

In Chrome, the file is named "08_タイヨーアクリス_装置開発_実績表.pdf" But, on Firefox and Safari, it is named "08_%E3%82%BF%E3%82%A4%E3%83%A8%E3%83%BC%E3%82%A2%E3%82%AF%E3%83%AA%E3%82%B9_%E8%A3%85%E7%BD%AE%E9%96%8B%E7%99%BA_%E5%AE%9F%E7%B8%BE%E8%A1%A8.pdf".

Japanese character encoding using Shift_JIS in Java

Question

2 answers

solution1
1 2015-04-21 02:42:47

solution2
1 2015-04-21 06:15:09

Japanese character encoding using Shift_JIS in Java

Question

2 answers

solution1 1 2015-04-21 02:42:47

solution2 1 2015-04-21 06:15:09

solution1
1 2015-04-21 02:42:47

solution2
1 2015-04-21 06:15:09