简体   繁体   中英

Download .csv file format in SAS from the web

I wonder how to download some .csv file by using SAS.

Browsing on the web, I found it is possible to do that by running the following script:

filename NAME url "http://.../NAME_OF_THE_FILE.csv"

Particularly, I want to understand how such statement works and in which case I cannot use that.

For instance, let's assume one has to download a .csv file that is uploaded on a web page, as, for example, in the web site , where one can find football match data available.

In such case case, by using the following script to download the file:

filename csv url "http://www.football-data.co.uk/mmz4281/1617/E0.csv";

and the following one to import data in SAS:

proc import file = csv
            out  = junk_00
            dbms = csv replace;
            delimiter = ",";
run;

everything works fine. This file corresponds to the season 2016/2017 and contains the Premier League data, one can find on the first link.

Instead, in the case of the championship data for the 2016/2017 season, by using the same script as follows:

filename csv url "http://www.football-data.co.uk/mmz4281/1617/E1.csv";
proc import file = csv
            out  = junk_00
            dbms = csv replace;
            delimiter = ",";
run;

you get the following error:

Import unsuccessful.  See SAS Log for details.

Browsing at the LOG window you can see among the LOG lines the following note/warning:

Invalid data for Date , even if the file is formatted correctly.

I don't understand the reason sometimes the script works and sometimes not, since this happened with other file, although the file are not corrupted and formatted correctly and in the same way.

What's wrong? Can someone help me to understand why this happens?

Thanks all in advance!

Proc Import has to guess at data types. For some reason it thinks the date field is formatted as MMDDYY, but it's actually DDMMYY. Or maybe it's used inconsistently, I didn't check all, but could see the source of the error immediately.

The solution is to not use PROC IMPORT but to use a data step. If all the files are structured the same, then this works as a solution, but if each file is different then it's not a feasible solution.

Another possible workaround, is to download the data and then set GUESSINGROWS to a large number and then read the files. It will read all the values before guessing at the types so it can be better. This solution does not appear to work when using filename URL, but I don't know why.

I don't think this is a full answer, but it should shed some light on what's happening for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM