简体   繁体   中英

Using wget to download all zip files on an shtml page

I've been trying to download all the zip files on this website to an EC2 server. However, it is not recognizing the links and thus not downloading anything. I think it's because the shtml file requires that SSI be enabled and that's somehow causing a problem with wget. But I don't really understand that stuff.

This is the code I've been using unsuccessfully.

wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2015_2016

Thanks for any help you can provide!

The zip links aren't present on the source code, that's why you cannot download them via wget , they're generated via javascript . The file list is "located" inside http://fec.gov//finance/disclosure/tables/foia_files_summary.xml under node <fec_file status="Archive"></fec_file>

You can code a script to parse the xml file and convert the nodes to the actual links because they've a pattern.


UPDATE:

As @cyrus mentioned, the files are also on ftp.fec.gov/FEC/ , you can use wget -m for mirroring the ftp and -A zip to restrict the download to zip files, ie:

wget -A zip -m --user=anonymous --password=test@test.com ftp://ftp.fec.gov/FEC/

Or wget -r

wget -A zip --ftp-user=anonymous --ftp-password=test@test.com -r ftp://ftp.fec.gov/FEC/*

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM