简体   繁体   English

读取InputStream时忽略编码

[英]Encoding ignored while reading InputStream

I'm having some encoding problems in a Java application that makes HTTP requests to an IIS server. 我在向IIS服务器发出HTTP请求的Java应用程序中遇到一些编码问题。

Iterating over the headers of the URLConnection object I can see the following (relevant) headers: 遍历URLConnection对象的标头,我可以看到以下(相关的)标头:

Transfer-Encoding: [chunked]
Content-Encoding: [utf-8]
Content-Type: [text/html; charset=utf-8]

The URLConnection.getContentEncoding() method returns utf-8 as the document encoding. URLConnection.getContentEncoding()方法返回utf-8作为文档编码。

This is how my HTTP request, and stream read is being made: 这是我的HTTP请求和流读取的方式:

OutputStreamWriter sw = null;
BufferedReader br = null;
char[] buffer = null;
URL url;
url = new URL(this.URL);
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
sw = new OutputStreamWriter(connection.getOutputStream());
sw.write(postData);
sw.flush();
br = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF8"));
StringBuilder totalResponse = new StringBuilder();
String line;

while((line = br.readLine()) != null) {
    totalResponse.append(line);
}
buffer = totalResponse.toString().toCharArray();
if (sw != null)
    sw.close();

if (br != null)
    br.close();

return buffer;

However the following string sent by the server "ÃÃÃção" is received by the client as " o". 但是,客户端将服务器“ÃÃÃção”发送的以下字符串作为“ o”接收。

What am I doing wrong ? 我究竟做错了什么 ?

Based on your comments, you are trying to receive a FIX message from an IIS server and FIX uses ASCII. 根据您的评论,您尝试从IIS服务器接收FIX消息,并且FIX使用ASCII。 There are only a small subset of tags which support other encoding and they have to be treated in a special manner (non-ASCII tags in the standard FIX spec are 349,351,353,355,357,359,361,363,365). 只有一小部分标签支持其他编码,因此必须以特殊方式处理(标准FIX规范中的非ASCII标签为349,351,353,355,357,359,361,363,365)。 If such tags are present, you will get a tag 347 with a value specifying the encoding (for example UTF-8) and then each tag, will be preceded by a tag giving you the length of the coming encoded value (for tag 349, you will always get 348 first with an integer value) 如果存在此类标签,则您将获得一个标签347,其值指定了编码(例如UTF-8),然后每个标签之前都会有一个标签,为您提供即将到来的编码值的长度(对于标签349,您将始终首先获得348并带有整数值)

In your case, it looks like the server is sending a custom tag 10411 (the 10xxx range) in some other encoding. 在您的情况下,服务器似乎正在以其他某种编码发送自定义标签10411(10xxx范围)。 By convention, the preceding tag 10410 should give you the length of the value in 10411, but it contains "0000" instead, which may have some other meaning. 按照惯例,前面的标签10410应该为您提供10411中值的长度,但是它却包含“ 0000”,这可能具有其他含义。

Note that although FIX message are very readable, they should still be treated as binary data . 请注意,尽管FIX消息可读性强,但仍应将它们视为二进制数据 Tags and values are mostly ASCII characters, but the delimiter (SOH) is 0x01 and as mentioned above, certain tags may be encoded with another encoding. 标签和值大多是ASCII字符,但定界符(SOH)为0x01,如上所述,某些标签可以用另一种编码进行编码。 The IIS service should really return the data as application/octet-stream so it can be received properly. IIS服务应真正将数据作为application/octet-stream返回,以便可以正确接收它。 Attempting to return it as text/html is asking for trouble :). 尝试将其作为text/html返回会带来麻烦:)。

If the server really sends a Content-Encoding of "UTF-8" then it is very confused. 如果服务器确实发送了“ UTF-8”的内容编码,则非常困惑。 See http://svn.tools.ietf.org/svn/wg/httpbis/specs/rfc7231.html#header.content-encoding 参见http://svn.tools.ietf.org/svn/wg/httpbis/specs/rfc7231.html#header.content-encoding

For good order a couple of corrections. 为了获得良好的状态,需要进行一些更正。

    URLConnection connection = url.openConnection();
    connection.setDoOutput(true);
    connection.connect();
    try (Writer sw = new OutputStreamWriter(connection.getOutputStream(),
                StandardCharsets.UTF_8)) {
        sw.write(postData);
        sw.flush();

        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(connection.getInputStream(),
                StandardCharsets.UTF_8))) {
            StringBuilder totalResponse = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                totalResponse.append(line).append("\r\n");
            }
            return totalResponse.toString().toCharArray();
        } // Close br.
    } // Close sw.

Maybe: 也许:

postData =  ... + "Accept-Charset: utf-8\r\n" + ...;

Receiving the totalResponse.toString() you should have all read correctly. 接收totalResponse.toString()您应该已正确阅读所有内容。

But then when displaying again, the String/char is again converted to bytes, and there the encoding fails. 但随后再次显示时,字符串/焦炭被再次转换成字节,并且编码失败。 For instance System.out.println will not do as probably the Windows encoding is used. 例如System.out.println将不会执行,因为可能使用的是Windows编码。

You can test the String by dumping its bytes: 您可以通过转储字节来测试字符串:

String s = totalResponse.toString();
Logger.getLogger(getClass().getName()).log(Level.INFORMATION, "{0}",
    Arrays.toString(s.getBytes(StandardCharsets.UTF_8)));

In some rare cases the font will not contain the special characters. 在极少数情况下, 字体将不包含特殊字符。

Can you try by putting the stream as part of request attribute and then printing it out on client side. 您可以尝试将流作为请求属性的一部分,然后在客户端打印出来吗? a request attribute will be received as is withou any encoding issues 一个请求属性将被接收,没有任何编码问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM