如何在不使用“”替换部分HTML代码的情况下使用Java下载完整的网页？

Question

I've been writing some code that goes to a website and copies the HTML code to a text file. 我一直在编写一些网站代码，并将HTML代码复制到文本文件中。 The problem is that some of the code gets replaced with "&nbsp". 问题是某些代码被替换为“＆nbsp”。 This is the code I'm using: 这是我正在使用的代码：

public void addRecords() throws IOException{

    URL google = new URL("Insert Website Here");
    BufferedReader in = new BufferedReader(
            new InputStreamReader(google.openStream()));

    String inputLine;
    while ((inputLine = in.readLine()) != null){
        System.out.println(inputLine);
        z.format("%s \n ", (inputLine));
    }
    in.close();
}

Answer 1

Read the web page into a contiguous buffer. 将网页读取到连续的缓冲区中。
Replace " " 替换为“＆nbsp;” with " ". 与“”。
Write to the text file. 写入文本文件。

Option 2 选项2

Read the web page (as you are now). 阅读网页（就像现在一样）。
Get one line of the web page. 获取网页的一行。
Replace " " 替换为“＆nbsp;” with " ". 与“”。
Write one line of the web page. 编写网页的一行。
If more lines, goto step 1. 如果有更多行，请转到步骤1。

Answer 2

There are many HTML entities of the form &...; 有许多形式为&...; HTML实体&...; that in the browser are shown as special characters. 在浏览器中显示为特殊字符。 You can even have free numbers, character codes: &8233; 您甚至可以使用免费的数字，字符代码： &8233; . 。

There is an Apache library commons lang with similar unescape functions: 有一个具有类似unescape功能的Apache库公共语言：

html = StringEscapeUtils.unescapeHtml4(html);

Answer 3

You can try something like this: 您可以尝试如下操作：

System.out.println(inputLine.replaceAll("&nbsp;"," "));

OBS > Note that your HTML page maybe will contain another characters escapes, so this solution will be not so good to reuse. OBS >请注意，您的HTML页面可能会包含其他字符转义符，因此此解决方案不太好重用。

You can refer to commons lang Apache project as seen here in this post: Replace HTML codes with equivalent characters in Java 您可以参考本文中在此处看到的common lang Apache项目：用Java中的等效字符替换HTML代码

如何在不使用“”替换部分HTML代码的情况下使用Java下载完整的网页？

问题描述

3 个解决方案

解决方案1
1 2016-03-08 17:52:33

解决方案2
0 2016-03-08 17:59:44

解决方案3
0 2016-03-08 18:06:17

如何在不使用“”替换部分HTML代码的情况下使用Java下载完整的网页？

问题描述

3 个解决方案

解决方案1 1 2016-03-08 17:52:33

解决方案2 0 2016-03-08 17:59:44

解决方案3 0 2016-03-08 18:06:17

解决方案1
1 2016-03-08 17:52:33

解决方案2
0 2016-03-08 17:59:44

解决方案3
0 2016-03-08 18:06:17