[英]wget: save with .jpg extension
I made this script to download .jpg files from a database:我制作了这个脚本来从数据库下载 .jpg 文件:
for (( i = 1; i <= 9; i +=1))
do
wget http://archives.cg66.fr/mdr/index.php/docnumserv/getSubImage/0/0/0/-archives-009NUM_Etat_civil-Images---LLUPIA-2E1700_1702-FRAD066_2E1700_1702_000$i.jpg/0/100/0/100/100/100/100/100/2300/1500/0/100
done
because of the "/0/100/0/100/100..." after the .jpg extension, the result is:由于 .jpg 扩展名后的“/0/100/0/100/100...”,结果是:
9 files named: 100 , 100.1, 100.2, 100.3 ... 100.9 9 个文件名为: 100 , 100.1, 100.2, 100.3 ... 100.9
and I would find a way to have 9 .jpg files named 0001.jpg, 0002.jpg, 0003.jpg ... 0009.jpg我会找到一种方法让 9 个 .jpg 文件命名为 0001.jpg, 0002.jpg, 0003.jpg ... 0009.jpg
Could you give me some help or advice?你能给我一些帮助或建议吗?
You coud try this way:你可以试试这样:
~$ URL1="http://archives.cg66.fr/mdr/index.php/docnumserv/getSubImage/0/0/0/-archives-009NUM_Etat_civil-Images---LLUPIA-2E1700_1702-FRAD066_2E1700_1702"
~$ URL2="0/100/0/100/100/100/100/100/2300/1500/0/100"
~$ for I in $(seq -w 0001 0009)
do
wget -O "${I}.jpg" "${URL1}_${I}.jpg/${URL2}"
done
To populate the i
variable with three leading zeros I use seq -w 0001 0009
.要使用三个前导零填充
i
变量,我使用seq -w 0001 0009
。 To download images with the right filename I use wget -O "${i}.jpg" ${URL}
.要下载具有正确文件名的图像,我使用
wget -O "${i}.jpg" ${URL}
。 This work also with more than 9 images, eg.这项工作也有超过 9 张图像,例如。 to produce a sequence of numbers from 1 to 999 with the leading zeros (
0001 ... 0099 ... 0999
) the command becomes seq -w 0001 0999
.要生成从 1 到 999 的数字序列,并带有前导零 (
0001 ... 0099 ... 0999
),命令变为seq -w 0001 0999
。
See man seq
and man wget
for the documentation (online here and here ).有关文档,请参阅
man seq
和man wget
(在线此处和此处)。
Of course the URL can't contain leading zeros between the variable ${i}
and the underscore, otherwise the wget
command will return an error page.当然,URL 不能在变量
${i}
和下划线之间包含前导零,否则wget
命令将返回错误页面。
For this reason I changed the URL from this: ..._1702_000$i.jpg/0/100/...
to this: ..._1702_${i}.jpg/0/100/...
.出于这个原因,我将 URL 从:
..._1702_000$i.jpg/0/100/...
更改为: ..._1702_${i}.jpg/0/100/...
。
The downloaded files:下载的文件:
~$ ls -l
total 20404
-rw-r--r-- 1 ale ale 2408227 Oct 9 22:38 0001.jpg
-rw-r--r-- 1 ale ale 2422199 Oct 9 22:38 0002.jpg
-rw-r--r-- 1 ale ale 2330667 Oct 9 22:38 0003.jpg
-rw-r--r-- 1 ale ale 2162542 Oct 9 22:38 0004.jpg
-rw-r--r-- 1 ale ale 2579155 Oct 9 22:38 0005.jpg
-rw-r--r-- 1 ale ale 2175118 Oct 9 22:38 0006.jpg
-rw-r--r-- 1 ale ale 2174325 Oct 9 22:38 0007.jpg
-rw-r--r-- 1 ale ale 2421311 Oct 9 22:38 0008.jpg
-rw-r--r-- 1 ale ale 2202587 Oct 9 22:38 0009.jpg
EDIT: Another alternative.编辑:另一种选择。 First I create a file with list URL:
首先,我创建一个包含列表 URL 的文件:
~$ URL1="http://archives.cg66.fr/mdr/index.php/docnumserv/getSubImage/0/0/0/-archives-009NUM_Etat_civil-Images---LLUPIA-2E1700_1702-FRAD066_2E1700_1702"
~$ URL2="0/100/0/100/100/100/100/100/2300/1500/0/100"
~$ for I in $(seq -w 0001 0009)
do
echo "${URL1}_{${I}}.jpg/${URL2}" >> url_list.txt
done
The loop outputs URLs formatted this way: ..._1702_{${I}}.jpg/0/100...
in order to save files with the format: '#1.jpg'
.循环输出格式如下的 URL:
..._1702_{${I}}.jpg/0/100...
以保存格式为: '#1.jpg'
。
~$ xargs -P 10 -n 1 curl -o '#1.jpg' < url_list.txt
However, this solution may overload the webserver.但是,此解决方案可能会使网络服务器过载。 In case of troubles, I think might be helpful to use the
wget
solution adding the option --limit-rate=amount
to limit download speed to amount
bytes per seconds.在困难的情况下,我想可能会有所帮助使用
wget
解决方案添加选项--limit-rate=amount
来限制下载速度,以amount
每秒字节。 Add k
for kilobytes, M
for megabytes.添加
k
表示千字节, M
表示兆字节。
References:参考:
xargs -n 1 -P {number_files}
): https://serverfault.com/a/722874xargs -n 1 -P {number_files}
): https : //serverfault.com/a/722874curl -o '#1.jpg'
): https://unix.stackexchange.com/a/91574curl -o '#1.jpg'
): https : //unix.stackexchange.com/a/91574
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.