I have a list of URLs in a text file:
http://host/index.html
http://host/js/test.js
http://host/js/sub/test_sub.js
http://host/css/test.css
I would like to download these files by replicating the same tree on my filesystem. For example, I would like to end with the following tree when I'm done:
wd/
|_index.html
|_js/
| |_test.js
| |_sub/
| |_test_sub.js/
|_css/
|_test.css
Here's what I've tried:
Add target file as second argument in list:
http://host/index.html
http://host/js/test.js js/test.js
http://host/js/sub/test_sub.js js/sub/test_sub.js
http://host/css/test.css css/test.css
Use a while loop to tell wget
where to save these:
while read url target; do
wget "$url" -P "$target";
done < site_media_list.txt
This didn't work, the end result was all files in same directory, without new directories.
Make a file with list of only links (no paths), one on each line, then wget -nH -x -i links_list.txt
downloads files to working directory keeping the directory structure intact. A more readable version of the same command is given below.
wget --no-host-directories --force-directories --input-file=links_list.txt
Wget has many flexible options for directories. Look up man wget
directory options for more.
Assuming your file site_media_list.txt
is containing only the files list (and not target directories), you should be able to parse out the directory names from the URL:
while read -r url ; do
s=$(echo "$url" | sed -E 's#http://host/(.*/)?.*$#\1#')
if [[ -z "$s" ]]; then
echo "working dir"
wget "$url"
else
echo "subdir"
mkdir -p "$s"
wget $url -P "$s"
fi
done < site_media_list.txt
It looks like the main problem you were having is that you were passing the directory name and filename to wget
- you only need to pass the directory name - wget
will calculate the filename from the URL.
Split the path on /
into an array, use only the relevant elements to create the path.
#!/bin/bash
while read url ; do
IFS=/ parts=($url)
if (( ${#parts[@]} > 4 )) ; then
IFS=/ path="${parts[*]:3:${#parts[@]}-4}"
mdkir -p "$path"
fi
IFS=/ wget -O "${parts[*]:3}" "$url"
done
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.