I've been trying to download all the zip files on this website to an EC2 server. However, it is not recognizing the links and thus not downloading anything. I think it's because the shtml file requires that SSI be enabled and that's somehow causing a problem with wget. But I don't really understand that stuff.
This is the code I've been using unsuccessfully.
wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2015_2016
Thanks for any help you can provide!
The zip links aren't present on the source code, that's why you cannot download them via wget
, they're generated via javascript
. The file list is "located" inside http://fec.gov//finance/disclosure/tables/foia_files_summary.xml under node <fec_file status="Archive"></fec_file>
You can code a script to parse the xml
file and convert the nodes to the actual links because they've a pattern.
UPDATE:
As @cyrus mentioned, the files are also on ftp.fec.gov/FEC/
, you can use wget -m
for mirroring the ftp and -A zip
to restrict the download to zip files, ie:
wget -A zip -m --user=anonymous --password=test@test.com ftp://ftp.fec.gov/FEC/
Or wget -r
wget -A zip --ftp-user=anonymous --ftp-password=test@test.com -r ftp://ftp.fec.gov/FEC/*
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.