简体   繁体   English

将 InputStream 读取为 UTF-8

[英]Reading InputStream as UTF-8

I'm trying to read from a text/plain file over the inte.net, line-by-line.我正在尝试通过 inte.net 逐行读取text/plain文件。 The code I have right now is:我现在拥有的代码是:

URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;

while ((readLine = in.readLine()) != null) {
    lines.add(readLine);
}

for (String line : lines) {
    out.println("> " + line);
}

The file, test.txt , contains ¡Hélló!文件test.txt包含¡Hélló! , which I am using in order to test the encoding. ,我正在使用它来测试编码。

When I review the OutputStream ( out ), I see it as > ¬°H√©ll√≥!当我查看OutputStream ( out ) 时,我将其视为> ¬°H√©ll√≥! . . I don't believe this is a problem with the OutputStream since I can do out.println("é");我不认为这是OutputStream的问题,因为我可以执行out.println("é"); without problems.没有什么问题。

Any ideas for reading form the InputStream as UTF-8?InputStream读取为 UTF-8 有什么想法吗? Thanks!谢谢!

Solved my own problem. 解决了我自己的问题。 This line: 这一行:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

needs to be: 需要是:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

or since Java 7: 或者从Java 7开始:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));
String file = "";

try {

    InputStream is = new FileInputStream(filename);
    String UTF8 = "utf8";
    int BUFFER_SIZE = 8192;

    BufferedReader br = new BufferedReader(new InputStreamReader(is,
            UTF8), BUFFER_SIZE);
    String str;
    while ((str = br.readLine()) != null) {
        file += str;
    }
} catch (Exception e) {

}

Try this,.. :-) 试试这个,.. :-)

I ran into the same problem every time it finds a special character marks it as . 每次发现一个特殊字符标记为 时,我遇到了同样的问题。 to solve this, I tried using the encoding: ISO-8859-1 为了解决这个问题,我尝试使用编码:ISO-8859-1

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));

while ((line = br.readLine()) != null) {

}

I hope this can help anyone who sees this post. 我希望这可以帮助任何看过这篇文章的人。

If you use the constructor InputStreamReader(InputStream in, Charset cs) , bad characters are silently replaced.如果您使用构造函数InputStreamReader(InputStream in, Charset cs) ,错误字符将被静默替换。 To change this behaviour, use a CharsetDecoder :要更改此行为,请使用CharsetDecoder

public static Reader newReader(Inputstream is) {
  new InputStreamReader(is,
      StandardCharsets.UTF_8.newDecoder()
      .onMalformedInput(CodingErrorAction.REPORT)
      .onUnmappableCharacter(CodingErrorAction.REPORT)
  );
}

Then catch java.nio.charset.CharacterCodingException .然后捕获java.nio.charset.CharacterCodingException

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM