简体   繁体   中英

Tomcat text file encoding

I have a java webapp which reads from file on disk and returns the needed values. The file on disk contains UTF-8 characters.

Example of the file content:

lähedus teeb korterist atraktiivse üüriobjekti välismaalastele

When the webapp is run on localhost* then the servlet reads from disk and returns:

lähedus teeb korterist atraktiivse üüriobjekti välismaalastele

When I run the same app on a separate server the same request returns this:

l??hedus teeb korterist atraktiivse ????riobjekti v??lismaalastele

This is purely an encoding issue but I don't know how to solve it.

What I have tried:

  • I added this to config/server.xml

     <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8"/> <!-- THIS PART 

But it didn't help. What should I change in config to have it working on server as well? Thanks!

EDIT

I am reading from a txt file on server containing json strings. I am using java BufferReader to read the content. As I mentioned in the comments, this problem is not caused by the reader because the same works on localhost.

I am sending the response via a servlet which just flushes the json string out. Again the same story as with the reader.

I get the question marks on any client I make the request (browser, android, etc).

Your local file seems to be in UTF-8, with a wrong conversion to some single-byte encoding. As one sees a multi-byte sequence for one special char resulting in two unconvertible chars ( ? ).

The application is reading it without specification of the encoding, hence using the system's encoding. That is not something you want.

And then you need to find the wrong reading code: often there is an overloaded method where one can add the encoding. Notorious however is FileReader, that utility class always uses the default encoding. Check occurrences of:

  • InputStreamReader
  • new String
  • String.getBytes
  • Scanner

For good order, but probably not the case here: any response yielding that text should specify the charset in the content-type.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM