简体   繁体   中英

Download from list of URLs and output to relative directories

I have a list of URLs in a text file:

http://host/index.html
http://host/js/test.js
http://host/js/sub/test_sub.js
http://host/css/test.css

I would like to download these files by replicating the same tree on my filesystem. For example, I would like to end with the following tree when I'm done:

wd/
 |_index.html
 |_js/
 |  |_test.js
 |  |_sub/
 |     |_test_sub.js/
 |_css/
    |_test.css

Here's what I've tried:

Add target file as second argument in list:

http://host/index.html 
http://host/js/test.js js/test.js
http://host/js/sub/test_sub.js js/sub/test_sub.js
http://host/css/test.css css/test.css

Use a while loop to tell wget where to save these:

 while read url target; do
   wget "$url" -P "$target";
 done < site_media_list.txt 

This didn't work, the end result was all files in same directory, without new directories.

Make a file with list of only links (no paths), one on each line, then wget -nH -x -i links_list.txt downloads files to working directory keeping the directory structure intact. A more readable version of the same command is given below.

wget --no-host-directories --force-directories --input-file=links_list.txt

Wget has many flexible options for directories. Look up man wget directory options for more.

Assuming your file site_media_list.txt is containing only the files list (and not target directories), you should be able to parse out the directory names from the URL:

while read -r url ; do
  s=$(echo "$url" | sed -E 's#http://host/(.*/)?.*$#\1#')
  if [[ -z "$s" ]]; then
    echo "working dir"
    wget "$url"
  else
    echo "subdir"
    mkdir -p "$s"
    wget $url -P "$s"
  fi
done < site_media_list.txt

It looks like the main problem you were having is that you were passing the directory name and filename to wget - you only need to pass the directory name - wget will calculate the filename from the URL.

Split the path on / into an array, use only the relevant elements to create the path.

#!/bin/bash
while read url ; do
    IFS=/ parts=($url)
    if (( ${#parts[@]} > 4 )) ; then
        IFS=/ path="${parts[*]:3:${#parts[@]}-4}"
        mdkir -p "$path"
    fi
    IFS=/ wget -O "${parts[*]:3}" "$url"
done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM