简体   繁体   English

从网上下载SAS中的.csv文件格式

[英]Download .csv file format in SAS from the web

I wonder how to download some .csv file by using SAS. 我想知道如何使用SAS下载某些.csv文件。

Browsing on the web, I found it is possible to do that by running the following script: 在网上浏览时,我发现可以通过运行以下脚本来做到这一点:

filename NAME url "http://.../NAME_OF_THE_FILE.csv"

Particularly, I want to understand how such statement works and in which case I cannot use that. 特别是,我想了解这种陈述是如何工作的,在这种情况下我不能使用它。

For instance, let's assume one has to download a .csv file that is uploaded on a web page, as, for example, in the web site , where one can find football match data available. 例如,假设我们必须下载一个上传到网页上的.csv文件,例如在该网站上 ,该网站可以找到可用的足球比赛数据。

In such case case, by using the following script to download the file: 在这种情况下,请使用以下脚本下载文件:

filename csv url "http://www.football-data.co.uk/mmz4281/1617/E0.csv";

and the following one to import data in SAS: 以下是用于在SAS中导入数据的代码:

proc import file = csv
            out  = junk_00
            dbms = csv replace;
            delimiter = ",";
run;

everything works fine. 一切正常。 This file corresponds to the season 2016/2017 and contains the Premier League data, one can find on the first link. 该文件对应于2016/2017赛季,其中包含英超联赛数据,您可以在第一个链接上找到该文件。

Instead, in the case of the championship data for the 2016/2017 season, by using the same script as follows: 相反,对于2016/2017赛季的冠军数据,使用如下相同的脚本:

filename csv url "http://www.football-data.co.uk/mmz4281/1617/E1.csv";
proc import file = csv
            out  = junk_00
            dbms = csv replace;
            delimiter = ",";
run;

you get the following error: 您得到以下错误:

Import unsuccessful.  See SAS Log for details.

Browsing at the LOG window you can see among the LOG lines the following note/warning: 在“日志”窗口中浏览时,您可以在“日志”行中看到以下注释/警告:

Invalid data for Date , even if the file is formatted correctly. Invalid data for Date ,即使文件格式正确也是如此。

I don't understand the reason sometimes the script works and sometimes not, since this happened with other file, although the file are not corrupted and formatted correctly and in the same way. 我不理解有时脚本起作用而有时不起作用的原因,因为这种情况发生在其他文件中,尽管该文件没有以相同的方式正确地损坏和格式化。

What's wrong? 怎么了? Can someone help me to understand why this happens? 有人可以帮助我了解为什么会这样吗?

Thanks all in advance! 提前谢谢大家!

Proc Import has to guess at data types. Proc Import必须猜测数据类型。 For some reason it thinks the date field is formatted as MMDDYY, but it's actually DDMMYY. 由于某种原因,它认为日期字段的格式为MMDDYY,但实际上是DDMMYY。 Or maybe it's used inconsistently, I didn't check all, but could see the source of the error immediately. 也许使用不一致,我没有检查全部,但是可以立即看到错误的来源。

The solution is to not use PROC IMPORT but to use a data step. 解决方案是不使用PROC IMPORT,而使用数据步骤。 If all the files are structured the same, then this works as a solution, but if each file is different then it's not a feasible solution. 如果所有文件的结构相同,那么这可以作为解决方案,但是如果每个文件都不相同,则这不是可行的解决方案。

Another possible workaround, is to download the data and then set GUESSINGROWS to a large number and then read the files. 另一个可能的解决方法是下载数据,然后将GUESSINGROWS设置为较大数量,然后读取文件。 It will read all the values before guessing at the types so it can be better. 它会在猜测类型之前先读取所有值,这样会更好。 This solution does not appear to work when using filename URL, but I don't know why. 使用文件名URL时,此解决方案似乎不起作用,但我不知道为什么。

I don't think this is a full answer, but it should shed some light on what's happening for you. 我认为这不是一个完整的答案,但是它应该可以为您提供一些帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM