通过wget在服务器上下载最新文件

Question

下午好，

我正在尝试弄清楚如何在Linux系统上使用wget从服务器下载最新文件。 这些文件是5分钟的雷达数据，因此文件增加了5分钟，直到最新的文件，即1930.grib2、1935.grib2、1940.grib2等。

当前，我在bash脚本中实现了以下代码，该代码从每小时开始下载每个文件，但这不是获取最新文件的有效方法：

HR=$(date +%H)
padtowidth=2
START=0
END=55
i=${START}

while [[ ${i} -le ${END} ]]
do

tau=$(printf "%0*d\n" $padtowidth ${i})

URL1=http://thredds.ucar.edu/thredds/fileServer/grib/nexrad/composite/unidata/files/${YMD}/Level_3_Composite_N0R_${YMD}_${HR}${tau}.grib2

wget -P ${HOMEDIR}${PATH1}${YMD}/${HR}Z/ -N ${URL1}

((i = i + 5))
done

Answer 1

如果有所有文件的索引，则可以先下载该索引，然后解析它以找到最新文件。

如果那不可能，那么您可以从当前时间开始倒数（除了date +%H之外，还使用date +%M ），如果wget能够获取文件，则停止它（例如，如果wget以0退出）。

希望能帮助到你！

解析索引的示例：

filename=`wget -q -O - http://thredds.ucar.edu/thredds/catalog/grib/nexrad/composite/unidata/NEXRAD_Unidata_Reflectivity-20140501/files/catalog.html | grep '<a href=' | head -1 | sed -e 's/.*\(Level3_Composite_N0R_[0-9]*_[0-9]*.grib2\).*/\1/'`

这将获取页面并通过快速sed运行包含<a href=的第一行以提取文件名。

Answer 2

我为此自动制作了一个C ++控制台程序。 我将在下面发布整个代码。 只需使用wget捕获目录文件，然后在同一目录中运行它，它将自动创建一个BAT文件，您可以随意启动该BAT文件以下载最新文件。 我是专门为Unidata THREDDS服务器编写的，因此我知道这是一个很好的答案。 编辑和重要注意事项 ：这是最新的GOES-16数据，因此您必须使用不同产品的子字符串值。

#include <iostream>
#include <string>
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;


int main() 

{

// First, I open the catalog.html which was downloaded using wget, and put the entire file into a string.

ifstream inFile; // create instance
inFile.open("catalog.html"); // opens the file
stringstream strStream; // create stringstream
strStream << inFile.rdbuf();  //read the file
string str = strStream.str();  //str holds the content of the file

cout << str << endl;  // The string contains the entire catalog ... you can do anything with the string

// Now I will create the entire URL we need automatically by getting the base URL which is known (step 1 is : string "first")

string first= "http://thredds-test.unidata.ucar.edu/thredds/fileServer/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/";

// The string "second" is the actual filename, since (to my knowledge) the filename in the HTML file never changes, but this must be watched in case it DOES change     in the future. I use the c++ substring function to extract it.

string second = str.substr(252784,76); 


// I then create a batch file and write "wget (base url + filename)" which can now automatically launch/download the latest GRIB2 file.

ofstream myfile2;
myfile2.open ("downloadGOESLatest.bat");
myfile2 << "wget ";
myfile2 << first;
myfile2 << second;
myfile2.close();


return 0;

}

通过wget在服务器上下载最新文件

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-05-01 23:23:25

解决方案2
0 2018-07-24 12:12:39

通过wget在服务器上下载最新文件

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-05-01 23:23:25

解决方案2 0 2018-07-24 12:12:39

解决方案1
2 已采纳 2014-05-01 23:23:25

解决方案2
0 2018-07-24 12:12:39