通過wget在服務器上下載最新文件

Question

下午好，

我正在嘗試弄清楚如何在Linux系統上使用wget從服務器下載最新文件。 這些文件是5分鍾的雷達數據，因此文件增加了5分鍾，直到最新的文件，即1930.grib2、1935.grib2、1940.grib2等。

當前，我在bash腳本中實現了以下代碼，該代碼從每小時開始下載每個文件，但這不是獲取最新文件的有效方法：

HR=$(date +%H)
padtowidth=2
START=0
END=55
i=${START}

while [[ ${i} -le ${END} ]]
do

tau=$(printf "%0*d\n" $padtowidth ${i})

URL1=http://thredds.ucar.edu/thredds/fileServer/grib/nexrad/composite/unidata/files/${YMD}/Level_3_Composite_N0R_${YMD}_${HR}${tau}.grib2

wget -P ${HOMEDIR}${PATH1}${YMD}/${HR}Z/ -N ${URL1}

((i = i + 5))
done

Answer 1

如果有所有文件的索引，則可以先下載該索引，然后解析它以找到最新文件。

如果那不可能，那么您可以從當前時間開始倒數（除了date +%H之外，還使用date +%M ），如果wget能夠獲取文件，則停止它（例如，如果wget以0退出）。

希望能幫助到你！

解析索引的示例：

filename=`wget -q -O - http://thredds.ucar.edu/thredds/catalog/grib/nexrad/composite/unidata/NEXRAD_Unidata_Reflectivity-20140501/files/catalog.html | grep '<a href=' | head -1 | sed -e 's/.*\(Level3_Composite_N0R_[0-9]*_[0-9]*.grib2\).*/\1/'`

這將獲取頁面並通過快速sed運行包含<a href=的第一行以提取文件名。

Answer 2

我為此自動制作了一個C ++控制台程序。 我將在下面發布整個代碼。 只需使用wget捕獲目錄文件，然后在同一目錄中運行它，它將自動創建一個BAT文件，您可以隨意啟動該BAT文件以下載最新文件。 我是專門為Unidata THREDDS服務器編寫的，因此我知道這是一個很好的答案。 編輯和重要注意事項 ：這是最新的GOES-16數據，因此您必須使用不同產品的子字符串值。

#include <iostream>
#include <string>
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;


int main() 

{

// First, I open the catalog.html which was downloaded using wget, and put the entire file into a string.

ifstream inFile; // create instance
inFile.open("catalog.html"); // opens the file
stringstream strStream; // create stringstream
strStream << inFile.rdbuf();  //read the file
string str = strStream.str();  //str holds the content of the file

cout << str << endl;  // The string contains the entire catalog ... you can do anything with the string

// Now I will create the entire URL we need automatically by getting the base URL which is known (step 1 is : string "first")

string first= "http://thredds-test.unidata.ucar.edu/thredds/fileServer/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/";

// The string "second" is the actual filename, since (to my knowledge) the filename in the HTML file never changes, but this must be watched in case it DOES change     in the future. I use the c++ substring function to extract it.

string second = str.substr(252784,76); 


// I then create a batch file and write "wget (base url + filename)" which can now automatically launch/download the latest GRIB2 file.

ofstream myfile2;
myfile2.open ("downloadGOESLatest.bat");
myfile2 << "wget ";
myfile2 << first;
myfile2 << second;
myfile2.close();


return 0;

}

通過wget在服務器上下載最新文件

問題描述

2 個解決方案

解決方案1
2 已采納 2014-05-01 23:23:25

解決方案2
0 2018-07-24 12:12:39

通過wget在服務器上下載最新文件

問題描述

2 個解決方案

解決方案1 2 已采納 2014-05-01 23:23:25

解決方案2 0 2018-07-24 12:12:39

解決方案1
2 已采納 2014-05-01 23:23:25

解決方案2
0 2018-07-24 12:12:39