简体   繁体   English

使用Java下载UTF-16 JSON字符串

[英]Downloading a UTF-16 JSON String with Java

I'm converting our iOS app to Android (first time with Android, but long time Java programmer). 我正在将我们的iOS应用程序转换为Android(第一次使用Android,但是长期使用Java程序员)。 There's a web service that provides 2 JSON feeds to the application. 有一个Web服务,为应用程序提供2个JSON源。 This web service is written in Python, and the first JSON string is outputted as 'ascii'. 此Web服务是用Python编写的,第一个JSON字符串输出为“ascii”。 This is fine, and the Android app downloads it fine and displays fine. 这很好,Android应用程序下载很好,显示正常。 The problem comes with the second one. 问题出在第二个问题上。

Since the JSON is prone to containing non-english characters (accents, punctuation etc), I've outputted it in Python as 'utf-16'. 由于JSON容易包含非英文字符(重音符号,标点符号等),因此我在Python中将其输出为'utf-16'。 I'm downloading the content as follows in the Android app: 我在Android应用中按如下方式下载内容:

new DownloadTask(new Downloader.Callback() {
        @Override
        public void finishedDownloading(String content) {

            final City[] cities = new Gson().fromJson(content, City[].class);
            Downloader.cities = cities;
            System.out.println("Found " + cities.length + " cities");
            getActivity().runOnUiThread(new Runnable() {
                @Override
                public void run() {
                    setListAdapter(new CityArrayAdapter(getActivity(),
                            R.layout.listview_item_row,
                            cities));
                    pb.dismiss();
                }
            });
        }
    }).execute(Constants.CITIES_URL);

Download Task: 下载任务:

protected String doInBackground(String... sUrl) {
    BufferedReader br = null;
    try {
        URL url = new URL(sUrl[0]);
        br = new BufferedReader(new InputStreamReader(url.openStream()));
        String line = br.readLine();
        String doc = "";
        while (line != null) {
            doc += line + "\r\n";
            line = br.readLine();
        }
        br.close();
        callback.finishedDownloading(doc);

        return doc;
    } catch (MalformedURLException e) {
        System.out.println("Exception: " + e.getMessage());
    } catch (IOException e) {
        System.out.println("Exception: " + e.getMessage());
    }
    return null;
}

I've been reading up about how Java handles Strings, and apparently a String is stored as UTF-16, so I'm not sure why this isn't working properly? 我一直在阅读Java如何处理字符串,显然字符串存储为UTF-16,所以我不确定为什么这不能正常工作?

Just to mention about errors, Gson throws an error, but only due to the String being incorrectly displayed. 仅提到错误,Gson会抛出错误,但仅由于String被错误显示。 When I've printed the url response to the console, it comes out with '?'s every other character (indicating an encoding error). 当我将url响应打印到控制台时,它会以'?'显示其他所有字符(表示编码错误)。

Your problem is the InputStreamReader. 你的问题是InputStreamReader。 You should be explicitly telling it what charset to use instead of using the platform default, which is not what you want. 您应该明确告诉它使用什么字符集而不是使用平台默认值,这不是您想要的。 Ideally, you should be reading the Content-Type header and using that to pick the charset intead of hardcoding utf-16 (LE or BE?). 理想情况下,您应该阅读Content-Type标头并使用它来选择硬编码utf-16(LE或BE?)的charset intead。

To clarify your thoughts about Java using utf-16 internally, you are correct, but the issue is that you need to convert bytes to characters and that has nothing to do with how Java internally handles String. 为了在内部使用utf-16澄清您对Java的看法,您是对的,但问题是您需要将字节转换为字符,而这与Java内部处理String的方式无关。

Also, you might want to think about using utf-8 as that tends to be the default unicode encoding on the web. 此外,您可能想要考虑使用utf-8,因为它往往是Web上的默认unicode编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM