I'm trying to normalize a string that has accent characters. It runs fine on my intellij IDE, but when i build it using maven and deploy the war in tomcat, I get unexpected results like this. Can you please help?
Java code to normalize
String normalizedString = Normalizer.normalize(inputText, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
Output from tomcat logs:
Input text = ůňa
Normalized String = AAa
Output when I run the same code on my local machine in an IDE
Input text = ůňa
Normalized String = una
Do I need to specify some encoding setting somewhere?
My maven has this:
#<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compiler-plugin.version}</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
This is present in my server.xml in tomcat
<Connector port="8443"
protocol="org.apache.coyote.http11.Http11NioProtocol"
SSLEnabled="true"
maxThreads="150"
scheme="https"
secure="true"
clientAuth="false"
sslProtocol="TLS"
URIEncoding="UTF-8"
/>
I was able to solve this. I was reading the data from a file and encoding was not mentioned while reading the file. Once I put that, issue got fixed
private static String inputStreamToString(InputStream is) throws IOException {
StringBuilder sb = new StringBuilder();
String line;
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"));
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
return sb.toString();
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.