简体   繁体   中英

Odd character encoding issue

We have some data sourced in Italy and being displayed from a server in Poland. We are getting some instances of character substitution. Specifically, the à (small letter A with a grave) is getting substituted with an ŕ (small letter R with an acute). We can see that the à is a 00E0 in the CP1252 Western European character set , and the ŕ is the same value in the CP1250 Eastern European character set, so we know this is a character set issue.

The page is being served by a Websphere app server using JSPs. I have an experimental page where I can reproduce the problem, and sort of fix it, but not in an acceptible manner.

If I set this in my JSP:

response.setContentType("text/html;charset=windows-1250");

The problem is reproduced and the R with acute is displayed.

To sort of fix the problem, on the browser, I change the encoding to "Western European" in IE or "Western Windows-1252" in Chrome.

So this would naturally lead me to believe that if I set "windows-1252" in the content type, it would fix the problem, but it does not. When I do that, the character is then displayed as a question mark.

I have played with all kinds of combinations of response.setContentType , response.setCharacterEncoding , response.setLocale , <meta http-equiv> , <meta charset> and most everything results in the ? showing. Only setting 1250 on the content type and then changing the encoding on the browser itself seems to fix the problem.

Any suggestions?

Thanks

First of all, each source must come with the character set it has been encoded with (ie you must know it), otherwise you won't know what character set to use when presenting that source, and your problem will arise with the next data source.
Secondly, if you can, you should ask your sources to move to utf-8, and have those providers re-write their content.

As having a common character set for all you sources is the best solution (and using utf-8 is the most compatible / standard-oriented way of doing it as of today), if you can't make them doing the conversion, by knowing the source encoding you may try to convert the data content from the source charset to your charset using a converter (I haven't used any, so I can't give you any advice on this).

At last, two notes:
1) there's no way to show two contents that use different character sets in a single web application (neither in a single web page), since - like you already found - you may only use one encoding at a time;
2) if your data content is strictly web-oriented, you may ask your sources to use html entities (but keep in mind that this could be a problem if then you'll present that content in eg PDF form).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM