简体   繁体   English

Java HttpURLConnection编码错误

[英]Wrong encoding with Java HttpURLConnection

Trying to read a generated XML from a MS Webservice 尝试从MS Web服务读取生成的XML

URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
  line = buff.readLine();
  text.append(line + "\n");
} while (line != null);
box.setText(text.toString());

or 要么

URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {

    inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
  System.out.println(inputLine);
}
in.close();

Any page reads fine except the web service output it reads the greater and less than signs strangely 任何页面都可以正常读取,除非Web服务输出奇怪地读取了大于和小于符号

it read < to "& lt;" 它读取<到“&lt;” and > to "& gt;" 和>到“&gt;” without spaces, but if i type them here without spaces stackoverflow makes them < and > 没有空格,但是如果我在这里输入它们而没有空格stackoverflow使它们<和>

Please help thanks 请帮忙谢谢

First there seem to be a confusion on this row: 首先,这一行似乎有些混乱:

inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");

This effectively says that you expect every row in the document that your server is providing to be URL encoded. 这实际上表示您希望服务器提供的文档中的每一行都经过URL编码。 URL encoding is not the same as document encoding. URL编码与文档编码不同。

http://en.wikipedia.org/wiki/Percent-encoding http://en.wikipedia.org/wiki/Percent-encoding

http://en.wikipedia.org/wiki/Character_encoding http://en.wikipedia.org/wiki/Character_encoding

Looking at your code snippet, I think URL encoding (percent encoding) is not what you're after. 查看您的代码片段,我认为URL编码(百分比编码)不是您想要的。

In terms of document character encoding. 文档字符编码方面。 You are making a conversion on this line: 您正在此行进行转换:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());

conn.getContent() returns an InputStream that operates on bytes, whilst the reader operates on chars - the character encoding conversion is done here. conn.getContent()返回一个InputStreamconn.getContent()字节进行操作,而阅读器对chars进行操作-字符编码转换在此完成。 Checkout the other constructors of InputStreamReader which takes the encoding as second argument. 检出InputStreamReader的其他构造函数,该构造函数将编码作为第二个参数。 Without the second argument you are falling back on whatever is your platform default in java. 没有第二个参数,您将退回到Java平台的默认值上。

InputStreamReader(InputStream in, String charsetName)

for instance lets you change your code to: 例如,让您将代码更改为:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent(), "utf-8");

But the real question will be "what encoding is your server providing the content in?" 但是真正的问题将是“服务器以哪种编码提供内容?” If you own the server code too, you may just hard code it to something reasonable such as utf-8 . 如果您也拥有服务器代码,则可以将其硬编码为诸如utf-8合理代码。 But if it can vary, you need to look at the http header Content-Type to figure it out. 但是,如果它可以变化,则需要查看http标头的Content-Type来找出它。

String contentType = conn.getHeaderField("Content-Type");

The contents of contentType will look like contentType的内容看起来像

text/plain; charset=utf-8

A short hand way of getting this field is: 获得该字段的简便方法是:

String contentEncoding = conn.getContentEncoding();

Notice that it's entirely possible that no charset is provided, or no Content-Type header, in which case you must fall back on reasonable defaults. 注意,很可能没有提供字符集或没有Content-Type头,在这种情况下,您必须使用合理的默认值。

Mark Rotteveel is correct, the webservice is the culprit here it's for some reason sending the greater than and less than sign with the & lt and & gt format Mark Rotteveel是正确的,Web服务是罪魁祸首,出于某种原因,它发送带有&lt和&gt格式的大于和小于符号

Thanks Martin Algesten but i have already stated i worked around it i was just looking for why it was this way. 感谢Martin Algesten,但我已经说过要解决这个问题,我只是在寻找为什么这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM