简体   繁体   English

wget 如何仅保存与目标页面链接的页面链接的某些文件类型?

[英]How can wget save only certains file types linked to from pages linked to by the target page?

How can wget save only certain file types linked to from pages linked to by the target page, regardless of the domain in which the certain files are? wget 如何仅保存与目标页面链接的页面链接的某些文件类型,而不管某些文件所在的域?

Trying to speed up a task I have to do often.试图加快我必须经常做的任务。

I've been rooting through the wget docs and googling, but nothing seems to work.我一直在浏览 wget 文档和谷歌搜索,但似乎没有任何效果。 I keep on either getting just the target page or the subpages without the files (even using -H), so I'm obviously doing badly at this.我继续要么只获取目标页面,要么获取没有文件的子页面(甚至使用-H),所以我显然在这方面做得不好。

So, essentially, example.com/index1/ contains links to example.com/subpage1/ and example.com/subpage2/, while the subpages contain links to example2.com/file.ext and example2.com/file2.ext, etc. However, example.com/index1.html may link to example.com/index2/ which has links to more subpages I don't want.因此,本质上,example.com/index1/ 包含指向 example.com/subpage1/ 和 example.com/subpage2/ 的链接,而子页面包含指向 example2.com/file.ext 和 example2.com/file2.ext 等的链接. 但是,example.com/index1.html 可能会链接到 example.com/index2/,其中包含指向更多我不想要的子页面的链接。

Can wget even do this, and if not then what do you suggest I use? wget 甚至可以做到这一点,如果没有,那么您建议我使用什么? Thanks.谢谢。

Following command worked for me.以下命令对我有用。

wget -r --accept "*.ext" --level 2 "example.com/index1/"

Need to do recursively so -r should be added.需要递归执行,因此应添加-r

Something like this should Work:像这样的东西应该工作:

wget --accept "*.ext" --level 2 "example.com/index1/"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM