I am trying to download data from this data page . I have tried a number of scripts I googled. On the data page I have to select the countries I want, one at a time. The one script which gets close to what I want is:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $url = 'https://www.ogimet.com/ultimos_synops2.php?lang=en&estado=Zamb&fmt=txt&Send=Send';
my $file = 'Zamb.txt';
getstore($url, $file);
However this script gives me the page, not the data. I would appreciate if I can get help to download the data, if this is possible. I would also appreciate to do it in php if this may be an easier alternative.
The link returns text wrapped in HTML. Simplest approach would be to use HTML::FormatText and HTML::Parse to get the text only version.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
use HTML::FormatText;
my $url = 'https://www.ogimet.com/ultimos_synops2.php?lang=en&estado=Zamb&fmt=txt&Send=Send';
my $text = HTML::FormatText->new(leftmargin=>0, rightmargin=>100000000000)->format(HTML::TreeBuilder->new_from_url($url));
my $file = 'Zamb.txt';
open (my $fh, '>', $file);
print $fh $text;
close ($fh);
HTML::TreeBuilder->new_from_url($url)
- download and parse the html HTML::FormatText
->new(leftmargin=>0, rightmargin=>100000000000)
- intialize the html format - set the right margin to a big value to prevent wrapping This is the content of Zamb.txt afterwards.
$ cat Zamb.txt
##########################################################
# Query made at 02/29/2020 18:15:54 UTC
##########################################################
##########################################################
# latest SYNOP reports from Zambia before 02/29/2020 18:15:54 UTC
##########################################################
202002291200 AAXX 29124 67855 42775 51401 10310 20168 3//// 48/// 85201
333 5//// 85850 83080=
My php fu isn't up to date, but for PHP, I think you can use the following:
<?php
$url = 'https://www.ogimet.com/ultimos_synops2.php?lang=en&estado=Zamb&fmt=txt&Send=Send';
$content = strip_tags(file_get_contents($url));
echo substr($content, strpos($content, '###############'));
Note: I seem to recall that there are some configuration options that might disable fetching URL via file_get_contents so YMMV.
However, the same page there is a note:
NOTE: If you want to get simply files with synop reports in CSV format without HTML tags consider to use the binary getsynop
This would get you the same data in a easy to use format:
$ wget "https://www.ogimet.com/cgi-bin/getsynop?begin=$(date +%Y%m%d0000)&state=Zambia" -o /dev/null -O - | tail -1
67855,2020,02,29,12,00,AAXX 29124 67855 42775 51401 10310 20168 3//// 48/// 85201 333 5//// 85850 83080=
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.