简体   繁体   English

使用wget在shtml页面上下载所有zip文件

[英]Using wget to download all zip files on an shtml page

I've been trying to download all the zip files on this website to an EC2 server. 我一直在尝试将该网站上的所有zip文件下载到EC2服务器。 However, it is not recognizing the links and thus not downloading anything. 但是,它无法识别链接,因此无法下载任何内容。 I think it's because the shtml file requires that SSI be enabled and that's somehow causing a problem with wget. 我认为这是因为shtml文件要求启用SSI,这以某种方式导致wget出现问题。 But I don't really understand that stuff. 但是我不太了解这些东西。

This is the code I've been using unsuccessfully. 这是我一直未成功使用的代码。

wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2015_2016

Thanks for any help you can provide! 感谢您的任何帮助,您可以提供!

The zip links aren't present on the source code, that's why you cannot download them via wget , they're generated via javascript . zip链接未显示在源代码中,这就是为什么您无法通过wget下载它们,而是通过javascript生成的原因。 The file list is "located" inside http://fec.gov//finance/disclosure/tables/foia_files_summary.xml under node <fec_file status="Archive"></fec_file> 文件列表位于节点<fec_file status="Archive"></fec_file>下的http://fec.gov//finance/disclosure/tables/foia_files_summary.xml内。

You can code a script to parse the xml file and convert the nodes to the actual links because they've a pattern. 您可以编写脚本来解析xml文件,并将节点转换为实际的链接,因为它们具有模式。


UPDATE: 更新:

As @cyrus mentioned, the files are also on ftp.fec.gov/FEC/ , you can use wget -m for mirroring the ftp and -A zip to restrict the download to zip files, ie: 正如@cyrus所提到的,这些文件也位于ftp.fec.gov/FEC/ ,您可以使用wget -m来镜像ftp和-A zip来将下载限制为zip文件,即:

wget -A zip -m --user=anonymous --password=test@test.com ftp://ftp.fec.gov/FEC/

Or wget -r wget -r

wget -A zip --ftp-user=anonymous --ftp-password=test@test.com -r ftp://ftp.fec.gov/FEC/*

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM