简体   繁体   中英

Bash script to wget url starting with a specific character

I have a URL http://example.com/dir that has many subdirectories with files that I want to save. Because its size is very big I want to break this operation in parts

eg. download everything from subdirectories starting with A like

http://example.com/A
http://example.com/Aa
http://example.com/Ab
etc

I have created the following script

#!/bin/bash

for g in A B C

do  wget -e robots=off -r -nc -np -R "index.html*" http://example.com/$g

done

but it tries to download only http://example.com/A and not http://example.com/A*

Look at this page, it has all you need to know:

https://www.gnu.org/software/wget/manual/wget.html

1) You could use:

--spider -nd -r -o outputfile <domain>

which does not download the files, it just checks if they are there. -nd prevents wget from creating directories locally -r to parse entire site -o outputfile to send the output to a file

to get a list of URLs to download.

2) then parse the outputfile to extract the files, and create smaller lists of links you want to download.

3) then use -i file (== --input-file=file ) to download each list, thus limiting how many you download in one execution of wget .

Notes: - --limit-rate=amount can be used to slow down downloads, to spare your Internet link!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM