简体   繁体   中英

Download directories using wget command

I need to download only rpm files from different directories. Here is my code -

    #!/usr/bin/env bash

    # Download Only rpm file from certain directory.
    # wget
    #   -4 = only ipv4 
    #   -A = accept list
    #   -r = reccursively
    #   -R = reject list
    #   -c = continue
    #   -e = execute command
    #  --exclude-directories = take list

    # Create a directory
    mkdir mrepo

    # Enter into the directory
    cd mrepo

    # RPM URL
    repo_url="http://download.virtualbox.org/virtualbox/"       

    # Repo rpm
    repo_download=('5.2.20' '5.2.22' '6.0.0')
    # Exclude directories
    exclude_dir=('*_Beta')
    # Download all rpm packages                     
    for i in "${repo_download[@]}"; do
      echo $i/
      echo ${repo_url}/$i/
      # wget -A rpm -rc -e robots=off --reject "index.html*" ${repo_url}/$i/
      wget -A zip -rc -e robots=off --reject "index.html*" ${repo_url}/$i/
    done

    # Tar the downloaded rpm 
    tar -cvzf missingrepo.tgz --exclude=./*.sh .

My target is to

  1. Download only rpm file
  2. From specific directories; so I created a list of those directories and pass it on for loop. Apparently, it seems working. But actually not.
  3. It is executed, it enters the desired directory. Ignoring all directories which are before this version. Here 5.2.20 and download the rpm files from this directory. but when it finished downloading of that directory, it execute wget command to all directories and starts downloading rpm file from all directories. :-(
  4. Tried to use --exclude-directories= to exclude unnecessary sub directories. But --exclude-directories argument is not working. PS: To execute rapidly and test purpose, I use zip files to download.

    wget -A zip -rc -e robots=off --reject "index.html*" --exclude-directories=exclude_dir ${repo_url}/$i/

Any help would be really helpful !!

Use the -np|--no-parent and -l|--level command line option of wget .

Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

Specify recursion maximum depth level. If you want to download all the files from one directory, use '-l 1' to make sure the recursion depth never exceeds one.

So the command should look like this wget -A zip -np -r -l 1 -c -e robots=off --reject "index.html*" ${repo_url}/${i}/ . --reject "index.html*" is useless in my mind. And you should correct repo_url in your script to "http://download.virtualbox.org/virtualbox" without the trailing slash. So you get

wget -A zip -np -r -l 1 -c -e robots=off ${repo_url}/${i}/

The result is then

mrepo/download.virtualbox.org/virtualbox/5.2.20/VirtualBoxSDK-5.2.20-125813.zip
mrepo/download.virtualbox.org/virtualbox/5.2.22/VirtualBoxSDK-5.2.22-126460.zip
mrepo/download.virtualbox.org/virtualbox/6.0.0/VirtualBoxSDK-6.0.0-127566.zip

Just to be complete, a short version of the script is as follows:

#!/usr/bin/env bash

repo_url="https://download.virtualbox.org/virtualbox"
repo_download=('5.2.20' '5.2.22' '6.0.0')

for i in "${repo_download[@]}"; do
  wget -A zip -np -r -l 1 -c -e robots=off ${repo_url}/${i}/
done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM