JAVA：下載的HTML文件中的希臘字符未顯示，該如何解決？

Question

我正在下載HTML文件，需要使用System.out.println()進行顯示。

問題是我得到的不是希臘字母，而是垃圾。

我正在使用下面的代碼下載HTML文件：

 URL url = new URL("here goes the link to the html file");
 BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
 String htmlfile = "";
 String temp;
 while ((temp = br.readLine()) != null) {
       htmlfile+= temp;
 }
 System.out.println(htmlfile);

是否可以解決此問題？ 這是我得到的結果的示例：

    <title>Ξ ΟΞ»Ξ·  ΞΞ»Ξ΅ΞΊΟΟΏΟ ΟΏ Ξ΄ΞΉΞΊΟΟΞ±ΞΊΟ ΟΟΟΞΏ</title>

我在計算機上的所有區域設置都可以。 我可以使用System.out.println直接顯示希臘語單詞。 我有一種感覺，我需要在BufferedReader更改一些語言環境設置，但是我不確定該怎么做，或者這是否是解決此問題的正確方法。

有點偏離主題，我感覺上面下載HTML文件的方法確實無效。 例如，當我使用html+=temp ，基本上不是每次讀取HTML文件中的一行時都創建一個新的String實例嗎？ 這聽起來很昂貴，如果可以的話，請建議我其他更有效的方法。

Answer 1

String encoding = "UTF-8"; // Or "ISO-8859-7"
BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(), encoding));

ISO-8859-1是希臘語使用的8位編碼，UTF-8是多字節unicode編碼。

StringBuilder sb = new StringBuilder();
String temp;
while ((temp = br.readLine()) != null) {
    sb.append(temp).append("\n");
    System.out.println(temp);
}
String html = sb.toString();

readLine刪除行尾（ \\r舊MacOS， \\n Unix或\\r\\n Windows）。

Answer 2

您需要使用響應標題指定的內容類型的字符集。

以下內容適用於使用java.net.URLConnection來觸發和處理針對您的問題的HTTP請求。

URL url = new URL("here goes the link to the html file");
URLConnection conn = url.openConnection();
try {
  InputStream in = conn.getInputStream();
  // Look at the input connection headers to figure out the character encoding.
  // The contentType is null or a String like "text/html; charset=UTF-8"
  String contentType = conn.getContentType();
  // Get the charset from the content type.
  String charset = null;
  if (contentType != null) {
    for (String param : contentType.replace(" ", "").split(";")) {
      if (param.startsWith("charset=")) {
        charset = param.split("=", 2)[1];
        break;
      }
    }
  }
  // Choose a default that does not depend on the default encoding.
  // It might be best to use the default encoding if the URL is a
  // file: URL.
  if (charset == null) { charset = "UTF-8"; }
  Reader r = new InputStreamReader(in, charset);
  BufferedReader br = new BufferedReader(r);
  // Read the content from the buffered reader as above.
  // See below.
} finally {
  conn.close();
}

有點偏離主題，我感覺上面下載HTML文件的方法確實無效。 例如，當我使用html + = temp時，基本上不是每次我從HTML文件中讀取一行時都創建一個新的String實例嗎？

是的，下面是閱讀字符的更有效方法。

StringBuilder sb = new StringBuilder();
char[] buf = new char[4096];
for (int nRead; (nRead = br.read(buf)) > 0;) {
  sb.append(buf, 0, nRead);
}
String html = sb.toString();

您可以通過conn.getHeaderField("Content-length")讀取Content-length標頭，以提示內容的大小，並預先調整StringBuilder的大小。

JAVA：下載的HTML文件中的希臘字符未顯示，該如何解決？

問題描述

2 個解決方案

解決方案1
2 已采納 2012-02-26 14:27:13

解決方案2
1 2012-02-26 14:45:21

JAVA：下載的HTML文件中的希臘字符未顯示，該如何解決？

問題描述

2 個解決方案

解決方案1 2 已采納 2012-02-26 14:27:13

解決方案2 1 2012-02-26 14:45:21

解決方案1
2 已采納 2012-02-26 14:27:13

解決方案2
1 2012-02-26 14:45:21