简体   繁体   中英

How to wget the more recent file of a directory

I would like to write a bash script that downloads and install the latest daily build of program (RStudio). Is it possible to make wget to download only the most recent file in the directory http://www.rstudio.org/download/daily/desktop/ ?

The files seem to be sorted by the release date, with each new release being a new entry with a new name reflecting the version number change, so checking timestamps of a certain file seems unnecessary.

Also, you have provided a link to a "directory", which essentially is a web page. AFAIK, there is no such thing as a directory in http (which is a communication protocol serving you data at the given address). What you see is a listing generated by the server that resembles windows folders for the ease of use, though it's still a web page.

Having that said, you can scrape that web page. The following code downloads the file at first position on the listing (assuming the first one is the most recent one):

#!/bin/bash

wget -q -O tmp.html http://www.rstudio.org/download/daily/desktop/ubuntu64/
RELEASE_URL=`cat tmp.html | grep -m 1 -o -E "https[^<>]*?amd64.deb" | head -1`
rm tmp.html

# TODO Check if the old package name is the same as in RELEASE_URL.

# If not, then get the new version.
wget -q $RELEASE_URL

Now you can check it against your local most-recent version, and install if necessary.

EDIT: Updated version, which does simple version checking and installs the package.

#!/bin/bash

MY_PATH=`dirname "$0"`
RES_DIR="$MY_PATH/res"

# Piping from stdout suggested by Chirlo.
RELEASE_URL=`wget -q -O - http://www.rstudio.org/download/daily/desktop/ubuntu64/ | grep -m 1 -o "https[^\']*"`

if [ "$RELEASE_URL" == "" ]; then
    echo "Package index not found. Maybe the server is down?"
    exit 1
fi

mkdir -p "$RES_DIR"
NEW_PACKAGE=${RELEASE_URL##https*/}
OLD_PACKAGE=`ls "$RES_DIR"`

if [ "$OLD_PACKAGE" == "" ] || [ "$OLD_PACKAGE" != "$NEW_PACKAGE" ]; then

    cd "$RES_DIR"
    rm -f $OLD_PACKAGE

    echo "New version found. Downloading..."
    wget -q $RELEASE_URL

    if [ ! -e "$NEW_PACKAGE" ]; then
        echo "Package not found."
        exit 1
    fi

    echo "Installing..."
    sudo dpkg -i $NEW_PACKAGE

else
    echo "rstudio up to date."
fi

And a couple of comments:

  • The script keeps a local res/ dir with the latest version (exactly one file) and compares it's name with the newly scraped package name. This is dirty (having a file doesn't mean that it has been successfully installed in the past). It would be better to parse the output of dpkg -l , but the name of the package might slightly differ from the scraped one.
  • You will still need to enter the password for sudo , so it won't be 100% automatic. There are a few ways around this, though without supervision you might encounter the previously stated problem.

A slightly cleaner variation of @Richard Pumps:

RELEASE_URL=$(wget -q -O -  http://www.rstudio.org/download/daily/desktop/ubuntu64 | grep -o -m 1 "https[^\']*" )

# check version from name ...


wget ${RELEASE_URL}

this avoids creating a tmp file by outputing the html file to stdout and filtering it.

The -N option will tell wget to only get a file if it's a newer version. However, using wget alone, you cannot do something as broad as downloading the newest file of all files in some remote directory. You'll need to write a bash script or something that does the checking and then calls wget to grab it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM