简体   繁体   中英

Using Google Translate Java Library, Languages with special chars return question marks

I have setup a Java program that I made for my apprenticeship project that takes in a JSON file of English strings and outputs a different language JSON file that is defined in the console. Some languages like french and Italian will output with the correct translations whereas Russian or Japanese will output with question marks as seen in the images bellow.

问号

I had searched around at saw that I needed to get the bytes of my string and then encode that to UTF-8 I did do this but was still getting question marks so I started to use he standard charsets built into Java and tried different ways of encoding/decoding the string I tried this:

编码/解码

and this gave me a different output of this : Ð?Ñ?ивеÑ?

package com.bis.propertyfiletranslator;

import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.security.GeneralSecurityException;
import java.util.List;

import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.googleapis.json.GoogleJsonResponseException;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.translate.Translate;
import com.google.api.services.translate.model.TranslationsListResponse;
import com.google.api.services.translate.model.TranslationsResource;

public class Translator {

    public static Translate.Translations.List list;
    private static final Charset UTF_8 = Charset.forName("UTF-8");
    private static final Charset ISO = Charset.forName("ISO-8859-1");

    public static void translateJSONMapThroughGoogle(String input, String output, String API, String language,
            List<String> subLists) throws IOException, GeneralSecurityException {

        Translate t = new Translate.Builder(GoogleNetHttpTransport.newTrustedTransport(),
                JacksonFactory.getDefaultInstance(), null).setApplicationName("PhoenUX-Google-Translate").build();
        try {

            list = t.new Translations().list(subLists, language).setFormat("text");

            list.setKey(API);

        } catch (GoogleJsonResponseException e) {

            if (e.getDetails().getMessage().equals("Invalid Value")) {
                System.err.println(
                        "\n Language not currently supported, check the accepted language codes and try again.\n\n Language Requested: "
                                + language);
            } else {
                System.out.println(e.getDetails().getMessage());
            }
        }

        for (TranslationsResource translationsResource : response.getTranslations()) {

            for (String key : JSONFunctions.jsonHashMap.keySet()) {

                JSONFunctions.jsonHashMap.remove(key);

                String value = translationsResource.getTranslatedText();
                String encoded = new String(value.getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);

                JSONFunctions.jsonHashMap.put(key, encoded);
                System.out.println(encoded);
                break;
            }
        }

        JSONFunctions.outputTranslationsBackToJson(output);
    }

}

So this is using the google cloud library, I added a sysout so I could see the results of what I had tried, so this code should be all you need to replicate it.

I expect the output of "Hello" to be "Привет"(russian) actual output is ???? or Ð?Ñ?ивеÑ? dependent on the encoding I use.

String encoded = new String(...) is dead wrong. Just

put(key, value):

Note that System.out.println will always have problems as the OS encoding might be some Windows ANSI encoding. Then it is likely non Unicode-capable - and String contains Unicode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM