[英]Download from list of URLs and output to relative directories
I have a list of URLs in a text file: 我在文本文件中有一个URL列表:
http://host/index.html
http://host/js/test.js
http://host/js/sub/test_sub.js
http://host/css/test.css
I would like to download these files by replicating the same tree on my filesystem. 我想通过在文件系统上复制同一棵树来下载这些文件。 For example, I would like to end with the following tree when I'm done:
例如,完成后,我想以下面的树结尾:
wd/
|_index.html
|_js/
| |_test.js
| |_sub/
| |_test_sub.js/
|_css/
|_test.css
Here's what I've tried: 这是我尝试过的:
Add target file as second argument in list: 将目标文件添加为列表中的第二个参数:
http://host/index.html
http://host/js/test.js js/test.js
http://host/js/sub/test_sub.js js/sub/test_sub.js
http://host/css/test.css css/test.css
Use a while loop to tell wget
where to save these: 使用while循环告诉
wget
将它们保存在哪里:
while read url target; do
wget "$url" -P "$target";
done < site_media_list.txt
This didn't work, the end result was all files in same directory, without new directories. 这不起作用,最终结果是所有文件都在同一目录中,没有新目录。
Make a file with list of only links (no paths), one on each line, then wget -nH -x -i links_list.txt
downloads files to working directory keeping the directory structure intact. 制作一个仅包含链接列表(无路径)的文件,每行一个,然后
wget -nH -x -i links_list.txt
将文件下载到工作目录中,从而保持目录结构完整。 A more readable version of the same command is given below. 下面给出了同一命令的可读性更高的版本。
wget --no-host-directories --force-directories --input-file=links_list.txt
Wget has many flexible options for directories. Wget有许多灵活的目录选项。 Look up
man wget
directory options for more. 查找
man wget
目录选项以获取更多信息。
Assuming your file site_media_list.txt
is containing only the files list (and not target directories), you should be able to parse out the directory names from the URL: 假设您的文件
site_media_list.txt
仅包含文件列表(而不包含目标目录),则您应该能够从URL中解析出目录名称:
while read -r url ; do
s=$(echo "$url" | sed -E 's#http://host/(.*/)?.*$#\1#')
if [[ -z "$s" ]]; then
echo "working dir"
wget "$url"
else
echo "subdir"
mkdir -p "$s"
wget $url -P "$s"
fi
done < site_media_list.txt
It looks like the main problem you were having is that you were passing the directory name and filename to wget
- you only need to pass the directory name - wget
will calculate the filename from the URL. 看来您遇到的主要问题是您要将目录名和文件名传递给
wget
您只需要传递目录名wget
将从URL中计算文件名。
Split the path on /
into an array, use only the relevant elements to create the path. 将
/
上的路径拆分为一个数组,仅使用相关元素创建路径。
#!/bin/bash
while read url ; do
IFS=/ parts=($url)
if (( ${#parts[@]} > 4 )) ; then
IFS=/ path="${parts[*]:3:${#parts[@]}-4}"
mdkir -p "$path"
fi
IFS=/ wget -O "${parts[*]:3}" "$url"
done
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.