简体   繁体   中英

Java Character Encoding on Google App Engine

I am running a GWT application on Google App Engine which passes text input from the GUI via GWT-RPC/Servlet to an API. But umlauts like ä,ö,ü are misinterpreted by the API and the API shows only a ? instead of an umlaut.

I am pretty sure that the problem is the default character encoding on the Google App Engine, which is US-ASCII: US-ASCII does not know any umlaut.

Using umlauts with the API from JUnit-Tests on my local machine works. The default character encoding there is UTF-8.

The problem does not come from GWT or the Encoding with any HTML file; I used a Constant Java String within the appliation containing some umlauts and passed it to the API: the problem appears if the application is deployed in the Google App Engine.

Is there any way to change the Character Encoding in the Google App Engine? Or does anyone know another solution to my problem?

Storing umlauts from the GUI in the GAE Datastore and bringing them back to the GUI works funnily enough.

I was having the same problem: the default charset of a web application deployed to Google App Engine was set to US-ASCII , but I needed it to be UTF-8 .

After a bit of head scratching, I found that adding:

<system-properties>
    <property name="appengine.file.encoding" value="UTF-8" />
</system-properties>

to appengine-web.xml correctly sets the charset to UTF-8 . More details can be found on Google Issue Tracker - Setting of default encoding .

Workaround (safe)

I wrote this class to encode UTF-Strings to ASCII-Strings (replacing all chars which are not in the ASCII-table with their table-number, preceded and followed by a mark), using AsciiEncoder.encode(yourUtfString)

The String can then be decoded back to UTF with AsciiEncoder.decode(yourAsciiEncodedUtfString) where UTF is supported.

package <your_package>;

import java.util.ArrayList;

/**
 * Created by Micha F. aka Peracutor.
 * 04.06.2017
 */

public class AsciiEncoder {

    public static final char MARK = '%'; //use whatever ASCII-char you like (should be occurring not often in regular text)

    public static String encode(String s) {
        StringBuilder result = new StringBuilder(s.length() + 4 * 10); //buffer for 10 special characters (4 additional chars for every special char that gets replaced)
        for (char c : s.toCharArray()) {
            if ((int) c > 127 || c == MARK) {
                result.append(MARK).append((int) c).append(MARK);
            } else {
                result.append(c);
            }
        }
        return result.toString();
    }

    public static String decode(String s) {
        int lastMark = -1;
        ArrayList<Character> chars = new ArrayList<>();
        try {
            //noinspection InfiniteLoopStatement
            while (true) {
                String charString = s.substring(lastMark = s.indexOf(MARK, lastMark + 1) + 1, lastMark = s.indexOf(MARK, lastMark));
                char c = (char) Integer.parseInt(charString);
                chars.add(c);
            }
        } catch (IndexOutOfBoundsException | NumberFormatException ignored) {}

        for (char c : chars) {
            s = s.replace("" + MARK + ((int) c) + MARK, String.valueOf(c));
        }
        return s;
    }
}

Hope this helps someone.

If you (like myself) are using the Java flexible environment on Google AppEngine, the default encoding can "simply" be fixed by setting the file.encoding system property through your app.yaml (via an environment variable that is automatically picked up by the runtime) like this:

env_variables:
  JAVA_USER_OPTS: -Dfile.encoding=UTF-8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM