I have a list of tab-separated urls, and target file names, urls_to_download.txt
, for example:
first_file.jpg\thttps://www.google.co.il/images/srpr/logo11w.png
/subdir_1/second_file.jpg\thttps://www.google.co.il/images/srpr/logo12w.png
...
last_file.jpg\thttps://www.google.co.il/images/srpr/logo99w.png
which I want to download using several connections.
This I can do, for example, by:
cat urls_to_download.txt | xargs -n 1 -P 10 wget -nc
My question is, how do I get the files to have the new names I want for them, so the output dir would have:
first_file.jpg
/subdir1/second_file.jpg
...
last_file.jpg
I am guessing that something like this should work for you:
#!/bin/bash
while read FILENAME URL; do
wget -nc -O "$FILENAME" "$URL"
done <input.txt
where input.txt is a file which contains tab separated file/url pairs, one per line.
Note that the file names in your file are using an absolute path. So you'd better rewrite those names to a relative path.
In shell, only using &
to put a process background can make your work parallel.
For example, if you want to be parallel, you do something like this:
#!/bin/bash
while read FILENAME URL
do
wget -nc -O "./$FILENAME" "$URL" & # So `wget` runs in background
done < input.txt
NOTE : The above script is just a hint and will create too many parallel wget
processes if you have a lot of lines in input.txt
. There are some ways to control the number of parallel tasks, which however are more or less complicated to a shell script.
A very simple way to control the number of parallel tasks, which ensures that there are at most 20 wget processes.
#!/bin/bash
NUMBER=0
while read FILENAME URL
do
wget -nc -O "./$FILENAME" "$URL" & # So `wget` runs in background
NUMBER=$((NUMBER + 1))
if [ $NUMBER -gt 20 ]
then
wait # wait all background process to finish
NUMBER=0
fi
done < input.txt
wait
However, this method is so simple that it is not the most efficient and accurate way to control the number of parallel tasks.
try this command to download you files concurrently:
`cut -f 2 urls_to_download.txt | wget -i -;`
`cut -f 2 urls_to_download.txt | sed 's/.*\///' | while read f; do mv $f $(cut -f 1 urls_to_download.txt); done`
I can't find a way to rename the file properly with the wget
option and you need to modify to make sure the directory exists in the mv
command.
Simply use wget
's -x
option:
-x
--force-directories
The opposite of -nd---create a hierarchy of directories, even if one would not have been created
otherwise. Eg wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to
fly.srk.fer.hr/robots.txt.
xargs -n 1 -P 10 wget -nc < urls_to_download.txt
If your file is tab-delimited:
xargs -n 1 -d $'\t' -P 10 wget -nc -x < urls_to_download.txt
Or perhaps you can convert tabs to newlines:
sed -e 's|\t|\n|g' urls_to_download.txt | xargs -n 1 -P 10 wget -nc -x
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.