[英]How to download a complete web page (with all its contents) in Java?
Using Java, I need to save a complete webpage(with all its contents like images, css, javascript etc) like how we can do with save as-->complete webpage option with HttpClient lib.使用 Java,我需要保存一个完整的网页(及其所有内容,如图像、css、javascript 等),就像我们如何使用 HttpClient lib 保存为--> 完整网页选项一样。 How can I do this?
我怎样才能做到这一点?
You can try lib curl java http://curl.haxx.se/libcurl/java/你可以试试 lib curl java http://curl.haxx.se/libcurl/java/
And you can refer to this discussion also curl-equivalent-in-java你也可以参考这个讨论curl-equivalent-in-java
您必须编写一个应用程序来获取 html 文件,解析它并提取所有引用,然后获取通过解析找到的所有文件。
It's not so easy because some CSS/JS/Images files paths might be "hidden".这并不容易,因为某些 CSS/JS/Images 文件路径可能是“隐藏的”。 Just consider the following example:
请考虑以下示例:
<script type="...">
document.write("&bla;script" + " type='...' src='" + blahBlah() + "'&bla;" + "&bla;/script&bla;");
</script>
However, fetching page source, parsing in the search for URLs and downloading founded URLs is pretty everything you'll probably need.但是,获取页面源代码、解析 URL 搜索和下载已建立的 URL 几乎是您可能需要的一切。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.