简体   繁体   English

URLConnection没有得到charset

[英]URLConnection does not get the charset

I'm using URL.openConnection() to download something from a server. 我正在使用URL.openConnection()从服务器下载内容。 The server says 服务器说

Content-Type: text/plain; charset=utf-8

But connection.getContentEncoding() returns null . 但是connection.getContentEncoding()返回null What up? 怎么了?

The value returned from URLConnection.getContentEncoding() returns the value from header Content-Encoding URLConnection.getContentEncoding()返回的值返回标头Content-Encoding的值

Code from URLConnection.getContentEncoding() 来自URLConnection.getContentEncoding()代码

/**
     * Returns the value of the <code>content-encoding</code> header field.
     *
     * @return  the content encoding of the resource that the URL references,
     *          or <code>null</code> if not known.
     * @see     java.net.URLConnection#getHeaderField(java.lang.String)
     */
    public String getContentEncoding() {
       return getHeaderField("content-encoding");
    }

Instead, rather do a connection.getContentType() to retrieve the Content-Type and retrieve the charset from the Content-Type. 相反,请执行connection.getContentType()以检索Content-Type并从Content-Type检索charset。 I've included a sample code on how to do this.... 我已经包含了如何执行此操作的示例代码....

String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";

for (String value : values) {
    value = value.trim();

    if (value.toLowerCase().startsWith("charset=")) {
        charset = value.substring("charset=".length());
    }
}

if ("".equals(charset)) {
    charset = "UTF-8"; //Assumption
}

This is documented behaviour as the getContentEncoding() method is specified to return the contents of the Content-Encoding HTTP header, which is not set in your example. 这是记录的行为,因为指定了getContentEncoding()方法以返回Content-Encoding HTTP标头的内容,该标头未在您的示例中设置。 You could use the getContentType() method and parse the resulting String on your own, or possibly go for a more advanced HTTP client library like the one from Apache . 您可以使用getContentType()方法并自行解析生成的String,也可以使用Apache中高级的 HTTP客户端库。

Just as an addition to the answer from @Buhake Sindi. 正如@Buhake Sindi的回答一样。 If you are using Guava, instead of the manual parsing you can do: 如果您使用的是Guava,而不是手动解析,您可以执行以下操作:

MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM