简体   繁体   中英

Special Characters getting morphed to their XML number reference Java, tomcat9

I recently updated one of my apps from Tomcat6 to Tomcat9 and Java8 to OpenJDK 11, Linux and I was seeing a problem with my forms when I submit special characters such as Japanese/Chinese characters.

It doesn't look like the issue is coming from code since I tried running the app in my old Tomcat6/Java8 box and the special characters are not being converted. It might be some server configuration but I'm not really sure where to look.

I input "法敲中" and it gets converted into &# 27861; &# 25970; &# 20013; once I submit my forms

This seems to be normal behavior. The numbers you see are XML encoding that is equivalent to "\法\敲\中" - unicode codes for the symbols "法敲中" XML parser wants to ensure that the xml String could be passed with simple English encoding (ISO8859-1) and those symbols could not be passed in that encoding. So it converts it to unicode equivalents so later it could still "understand" and decode non-standard symbols even though ISO8859-1 charset doesn't support them. This is a precaution since if you work with UTF-8 it would pass just fine. Anyway I used a tool to test it that I find very useful. It converts any String into Unicode characters and back. Here is what I did:

System.out.println(StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("法敲中"));

And the result came as \法\敲\中 . If you wish to use this tool, it is part of the MgntUtils open source library (written by me). Here are the links to Maven Artifacts , Github (including sources and javadoc) and javadoc . Also you can read about the library here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM