简体   繁体   中英

Java text normalization behaving differently after deploying the war in tomcat

I'm trying to normalize a string that has accent characters. It runs fine on my intellij IDE, but when i build it using maven and deploy the war in tomcat, I get unexpected results like this. Can you please help?

Java code to normalize

String normalizedString = Normalizer.normalize(inputText, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");

Output from tomcat logs:

Input text = ůňa
Normalized String = AAa

Output when I run the same code on my local machine in an IDE

Input text = ůňa
Normalized String = una

Do I need to specify some encoding setting somewhere?

My maven has this:

#<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compiler-plugin.version}</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>

This is present in my server.xml in tomcat

  <Connector port="8443" 
  protocol="org.apache.coyote.http11.Http11NioProtocol"
  SSLEnabled="true"
  maxThreads="150"
  scheme="https"
  secure="true"
  clientAuth="false"
  sslProtocol="TLS" 
  URIEncoding="UTF-8"
  />

I was able to solve this. I was reading the data from a file and encoding was not mentioned while reading the file. Once I put that, issue got fixed

private static String inputStreamToString(InputStream is) throws IOException {
    StringBuilder sb = new StringBuilder();
    String line;
    BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"));
    while ((line = br.readLine()) != null) {
        sb.append(line);
    }
    br.close();
    return sb.toString();


}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM