BASH腳本：使用wget下載連續編號的文件

Question

我有一個Web服務器，用於保存編號為Web應用程序的日志文件。 這個文件名示例如下：

dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log

最后3位是計數器，它們可以達到100的時間。

我經常打開一個Web瀏覽器，瀏覽到如下文件：

http://someaddress.com/logs/dbsclog01s001.log

並保存文件。 當你獲得50個日志時，這當然會有點煩人。 我試圖想出一個使用wget和傳遞的BASH腳本

http://someaddress.com/logs/dbsclog01s*.log

但我的劇本有問題。 無論如何，任何人都有關於如何做到這一點的樣本？

謝謝！

Answer 1

#!/bin/sh

if [ $# -lt 3 ]; then
        echo "Usage: $0 url_format seq_start seq_end [wget_args]"
        exit
fi

url_format=$1
seq_start=$2
seq_end=$3
shift 3

printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@"

將上面保存為seq_wget ，給它執行權限（ chmod +x seq_wget ），然后運行，例如：

$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50

或者，如果你有Bash 4.0，你可以輸入

$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log

或者，如果你有curl而不是wget ，你可以按照Dennis Williamson的回答。

Answer 2

curl似乎支持范圍。 從man頁：

URL  
       The URL syntax is protocol dependent. You’ll find a  detailed  descrip‐
       tion in RFC 3986.

       You  can  specify  multiple  URLs or parts of URLs by writing part sets
       within braces as in:

        http://site.{one,two,three}.com

       or you can get sequences of alphanumeric series by using [] as in:

        ftp://ftp.numericals.com/file[1-100].txt
        ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
        ftp://ftp.letters.com/file[a-z].txt

       No nesting of the sequences is supported at the moment, but you can use
       several ones next to each other:

        http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

       You  can  specify  any amount of URLs on the command line. They will be
       fetched in a sequential manner in the specified order.

       Since curl 7.15.1 you can also specify step counter for the ranges,  so
       that you can get every Nth number or letter:

        http://www.numericals.com/file[1-100:10].txt
        http://www.letters.com/file[a-z:2].txt

您可能已經注意到它“帶有前導零”！

Answer 3

您可以在wget url中使用echo類型序列來下載一串數字......

wget http://someaddress.com/logs/dbsclog01s00{1..3}.log

這也適用於字母

{a..z} {A..Z}

Answer 4

不確定你遇到了什么問題，但聽起來像bash中的簡單for循環會為你做。

for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done

Answer 5

你可以使用for循環和printf命令的組合（當然根據需要修改echo到wget ）：

$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html

Answer 6

有趣的任務，所以我為你寫了完整的腳本（結合幾個答案和更多）。 這里是：

#!/bin/bash
# fixed vars
URL=http://domain.com/logs/     # URL address 'till logfile name
PREF=logprefix                  # logfile prefix (before number)
POSTF=.log                      # logfile suffix (after number)
DIGITS=3                        # how many digits logfile's number have
DLDIR=~/Downloads               # download directory
TOUT=5                          # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
        file=$PREF`printf "%0${DIGITS}d" $i`$POSTF   # local file name
        dl=$URL$file                                 # full URL to download    
        echo "$dl -> $DLDIR/$file"                   # monitoring, can be commented
        wget -T $TOUT -q $dl -O $file
        if [ "$?" -ne 0 ]                            # test if we finished
        then
                exit
        fi
done

在腳本的開頭，你可以設置URL，日志文件前綴和后綴，你在編號部分和下載目錄中有多少位數。 Loop將下載它找到的所有日志文件，並在第一個不存在時自動退出（使用wget的超時）。

請注意，此腳本假定日志文件索引從1開始，而不是零，如示例中所述。

希望這可以幫助。

Answer 7

在這里，您可以找到一個看起來像您想要的Perl腳本

http://osix.net/modules/article/?id=677

#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;

for($count=1;$count<=$max;$count++) {
    if($count<10) {
    $url=$base_url."0".$count.$format; #insert a '0' and form the URL
    }
    else {
    $url=$base_url.$count.$format; #no need to insert a zero
    }
    system("$program $url");
}

Answer 8

我剛看了一下'globbing'的wget聯機幫助頁面討論：

默認情況下，如果URL包含通配符，則將打開通配符。 此選項可用於永久打開或關閉通配。 您可能必須引用URL以防止它被shell擴展。 Globbing使Wget尋找一個特定於系統的目錄列表。 這就是為什么它目前僅適用於Unix FTP服務器 （以及模擬Unix“ls”輸出的服務器）。

所以wget http：// ...不適用於globbing。

Answer 9

檢查您的系統是否有seq，然后很容易：

for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done

如果您的系統具有jot命令而不是seq：

for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done

Answer 10

晚會，但一個不需要編碼的真正簡單的解決方案是使用DownThemAll Firefox插件，它具有檢索文件范圍的功能。 當我需要下載800個連續編號的文件時，這是我的解決方案。

Answer 11

哦! 這是我在學習bash自動化漫畫下載時遇到的類似問題。

這樣的事情應該有效：

for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
    b="00"
elif [ ${#a} -eq 2 ]; then
    b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg

DONE

BASH腳本：使用wget下載連續編號的文件

問題描述

11 個解決方案

解決方案1
61 已采納 2009-09-15 15:00:14

解決方案2
38 2009-09-15 13:06:15

解決方案3
12 2016-01-04 18:24:48

解決方案4
11 2009-09-15 11:15:04

解決方案5
11 2009-09-15 11:15:58

解決方案6
1 2009-09-15 13:07:08

解決方案7
0 2009-09-15 11:13:49

解決方案8
0 2009-09-15 11:18:50

解決方案9
0 2009-09-15 20:14:21

解決方案10
0 2017-04-28 06:27:28

解決方案11
0 2011-01-05 13:10:34

BASH腳本：使用wget下載連續編號的文件

問題描述

11 個解決方案

解決方案1 61 已采納 2009-09-15 15:00:14

解決方案2 38 2009-09-15 13:06:15

解決方案3 12 2016-01-04 18:24:48

解決方案4 11 2009-09-15 11:15:04

解決方案5 11 2009-09-15 11:15:58

解決方案6 1 2009-09-15 13:07:08

解決方案7 0 2009-09-15 11:13:49

解決方案8 0 2009-09-15 11:18:50

解決方案9 0 2009-09-15 20:14:21

解決方案10 0 2017-04-28 06:27:28

解決方案11 0 2011-01-05 13:10:34

解決方案1
61 已采納 2009-09-15 15:00:14

解決方案2
38 2009-09-15 13:06:15

解決方案3
12 2016-01-04 18:24:48

解決方案4
11 2009-09-15 11:15:04

解決方案5
11 2009-09-15 11:15:58

解決方案6
1 2009-09-15 13:07:08

解決方案7
0 2009-09-15 11:13:49

解決方案8
0 2009-09-15 11:18:50

解決方案9
0 2009-09-15 20:14:21

解決方案10
0 2017-04-28 06:27:28

解決方案11
0 2011-01-05 13:10:34