简体   繁体   中英

Download all .tar.gz files from website/directory using WGET

So i'm attempting to create an alias/script to download all specific extensions from a website/directory using wget but i feel like there must be an easier way than what i've come up with.

Right now the code i've come up with from searching Google and the man pages is:

wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/

So in the example above i'm trying to download all the .tar.gz files from the OpenVZ precreated templates directory.

The above code works correctly but I have to manually specify --cut-dirs=2 which would cut out the /template/precreated/ directory structure that would normally be created and it also downloads the robots.txt file.

Now this isn't necessarily a problem and it's easy to just remove the robots.txt file but i was hoping i just missed something in the man pages that would allow me to do this same things without specifying the directory structure to cut out...

Thanks for any help ahead of time, it's greatly appreciated!

Use the -R option

-R robots.txt,unwanted-file.txt

as a reject list of files you don't want (comma-separated).

As for scripting this:

URL=http://download.openvz.org/template/precreated/
CUTS=`echo ${URL#http://} | awk -F '/' '{print NF -2}'`
wget -r -l1 -nH --cut-dirs=${CUTS} --no-parent -A.tar.gz --no-directories -R robots.txt ${URL}

That should work based on the subdirectories in your URL.

I would suggest, if this is really annoying and you're having to do it a lot, to just write a really short two-line script to delete it for you:

wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/
rm robots.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM