简体   繁体   中英

Downloading Most Recent File On Server Via wget

Good Afternoon All,

I'm trying to figure out how to download the most recent file from a server using wget on my Linux system. The files are 5-minute radar data so, the files increase by 5 minutes up to the most recent, ie 1930.grib2, 1935.grib2, 1940.grib2, etc.

Currently, I have the below code implemented in my bash script that downloads every file beginning at the top of every hour but, this is not an efficient way to get the most recent file:

HR=$(date +%H)
padtowidth=2
START=0
END=55
i=${START}

while [[ ${i} -le ${END} ]]
do

tau=$(printf "%0*d\n" $padtowidth ${i})

URL1=http://thredds.ucar.edu/thredds/fileServer/grib/nexrad/composite/unidata/files/${YMD}/Level_3_Composite_N0R_${YMD}_${HR}${tau}.grib2

wget -P ${HOMEDIR}${PATH1}${YMD}/${HR}Z/ -N ${URL1}

((i = i + 5))
done

If there is an index of all the files you could first download that and then parse it to find the most recent file.

If that is not possible you could count backwards from the current time (use date +%M in addition to date +%H ) and stop if wget was able to get the file (eg if wget exits with 0 ).

Hope it helps!


Example to parse the index:

filename=`wget -q -O - http://thredds.ucar.edu/thredds/catalog/grib/nexrad/composite/unidata/NEXRAD_Unidata_Reflectivity-20140501/files/catalog.html | grep '<a href=' | head -1 | sed -e 's/.*\(Level3_Composite_N0R_[0-9]*_[0-9]*.grib2\).*/\1/'`

This fetches the page and runs the first line containing a <a href= through a quick sed to extract the filename.

I made a C++ console program to to this automatically. I will post the entire code below. Just use wget to capture the catalog file, then run this in the same directory and it will automatically create a BAT file that you can launch at will to download the latest file. I wrote this specifically for the Unidata THREDDS server so I know this is a good answer. Edit and important note : This is for the latest GOES-16 data, so you will have to play around with the substring values for different products.

#include <iostream>
#include <string>
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;


int main() 

{

// First, I open the catalog.html which was downloaded using wget, and put the entire file into a string.

ifstream inFile; // create instance
inFile.open("catalog.html"); // opens the file
stringstream strStream; // create stringstream
strStream << inFile.rdbuf();  //read the file
string str = strStream.str();  //str holds the content of the file

cout << str << endl;  // The string contains the entire catalog ... you can do anything with the string

// Now I will create the entire URL we need automatically by getting the base URL which is known (step 1 is : string "first")

string first= "http://thredds-test.unidata.ucar.edu/thredds/fileServer/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/";

// The string "second" is the actual filename, since (to my knowledge) the filename in the HTML file never changes, but this must be watched in case it DOES change     in the future. I use the c++ substring function to extract it.

string second = str.substr(252784,76); 


// I then create a batch file and write "wget (base url + filename)" which can now automatically launch/download the latest GRIB2 file.

ofstream myfile2;
myfile2.open ("downloadGOESLatest.bat");
myfile2 << "wget ";
myfile2 << first;
myfile2 << second;
myfile2.close();


return 0;

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM