从URL列表下载并输出到相关目录

Question

I have a list of URLs in a text file: 我在文本文件中有一个URL列表：

http://host/index.html
http://host/js/test.js
http://host/js/sub/test_sub.js
http://host/css/test.css

I would like to download these files by replicating the same tree on my filesystem. 我想通过在文件系统上复制同一棵树来下载这些文件。 For example, I would like to end with the following tree when I'm done: 例如，完成后，我想以下面的树结尾：

wd/
 |_index.html
 |_js/
 |  |_test.js
 |  |_sub/
 |     |_test_sub.js/
 |_css/
    |_test.css

Here's what I've tried: 这是我尝试过的：

Add target file as second argument in list: 将目标文件添加为列表中的第二个参数：

http://host/index.html 
http://host/js/test.js js/test.js
http://host/js/sub/test_sub.js js/sub/test_sub.js
http://host/css/test.css css/test.css

Use a while loop to tell wget where to save these: 使用while循环告诉wget将它们保存在哪里：

 while read url target; do
   wget "$url" -P "$target";
 done < site_media_list.txt

This didn't work, the end result was all files in same directory, without new directories. 这不起作用，最终结果是所有文件都在同一目录中，没有新目录。

Answer 1

Make a file with list of only links (no paths), one on each line, then wget -nH -x -i links_list.txt downloads files to working directory keeping the directory structure intact. 制作一个仅包含链接列表（无路径）的文件，每行一个，然后wget -nH -x -i links_list.txt将文件下载到工作目录中，从而保持目录结构完整。 A more readable version of the same command is given below. 下面给出了同一命令的可读性更高的版本。

wget --no-host-directories --force-directories --input-file=links_list.txt

Wget has many flexible options for directories. Wget有许多灵活的目录选项。 Look up man wget directory options for more. 查找man wget目录选项以获取更多信息。

Answer 2

Assuming your file site_media_list.txt is containing only the files list (and not target directories), you should be able to parse out the directory names from the URL: 假设您的文件site_media_list.txt仅包含文件列表（而不包含目标目录），则您应该能够从URL中解析出目录名称：

while read -r url ; do
  s=$(echo "$url" | sed -E 's#http://host/(.*/)?.*$#\1#')
  if [[ -z "$s" ]]; then
    echo "working dir"
    wget "$url"
  else
    echo "subdir"
    mkdir -p "$s"
    wget $url -P "$s"
  fi
done < site_media_list.txt

It looks like the main problem you were having is that you were passing the directory name and filename to wget - you only need to pass the directory name - wget will calculate the filename from the URL. 看来您遇到的主要问题是您要将目录名和文件名传递给wget您只需要传递目录名wget将从URL中计算文件名。

Answer 3

Split the path on / into an array, use only the relevant elements to create the path. 将/上的路径拆分为一个数组，仅使用相关元素创建路径。

#!/bin/bash
while read url ; do
    IFS=/ parts=($url)
    if (( ${#parts[@]} > 4 )) ; then
        IFS=/ path="${parts[*]:3:${#parts[@]}-4}"
        mdkir -p "$path"
    fi
    IFS=/ wget -O "${parts[*]:3}" "$url"
done

从URL列表下载并输出到相关目录

问题描述

3 个解决方案

解决方案1
3 2017-02-21 15:47:33

解决方案2
0 2017-02-13 15:29:15

解决方案3
0 2017-02-13 15:29:58

从URL列表下载并输出到相关目录

问题描述

3 个解决方案

解决方案1 3 2017-02-21 15:47:33

解决方案2 0 2017-02-13 15:29:15

解决方案3 0 2017-02-13 15:29:58

解决方案1
3 2017-02-21 15:47:33

解决方案2
0 2017-02-13 15:29:15

解决方案3
0 2017-02-13 15:29:58