如何在 Java 中下载完整的网页（及其所有内容）？

Question

Using Java, I need to save a complete webpage(with all its contents like images, css, javascript etc) like how we can do with save as-->complete webpage option with HttpClient lib.使用 Java，我需要保存一个完整的网页（及其所有内容，如图像、css、javascript 等），就像我们如何使用 HttpClient lib 保存为--> 完整网页选项一样。 How can I do this?我怎样才能做到这一点？

Answer 1

You can try lib curl java http://curl.haxx.se/libcurl/java/你可以试试 lib curl java http://curl.haxx.se/libcurl/java/

And you can refer to this discussion also curl-equivalent-in-java你也可以参考这个讨论curl-equivalent-in-java

Answer 2

您必须编写一个应用程序来获取 html 文件，解析它并提取所有引用，然后获取通过解析找到的所有文件。

Answer 3

It's not so easy because some CSS/JS/Images files paths might be "hidden".这并不容易，因为某些 CSS/JS/Images 文件路径可能是“隐藏的”。 Just consider the following example:请考虑以下示例：

<script type="...">
   document.write("&bla;script" + " type='...' src='" + blahBlah() + "'&bla;" + "&bla;/script&bla;");
</script>

However, fetching page source, parsing in the search for URLs and downloading founded URLs is pretty everything you'll probably need.但是，获取页面源代码、解析 URL 搜索和下载已建立的 URL 几乎是您可能需要的一切。

如何在 Java 中下载完整的网页（及其所有内容）？

问题描述

3 个解决方案

解决方案1
0 2010-01-22 06:43:39

解决方案2
0 2010-01-22 06:43:45

解决方案3
0 2010-01-22 10:03:05

如何在 Java 中下载完整的网页（及其所有内容）？

问题描述

3 个解决方案

解决方案1 0 2010-01-22 06:43:39

解决方案2 0 2010-01-22 06:43:45

解决方案3 0 2010-01-22 10:03:05

解决方案1
0 2010-01-22 06:43:39

解决方案2
0 2010-01-22 06:43:45

解决方案3
0 2010-01-22 10:03:05