简体   繁体   中英

Efficiently convert a SAS dataset into a CSV

Can anyone tell me what is the fastest way to programmatically convert a SAS dataset into a CSV file. I know I can use a data step and output to a file etc. But is that the only way?

Thanks, Adnan.

something along these lines?

proc export data=sashelp.class
    outfile='c:\temp\sashelp class.csv'
    dbms=csv
    replace;
run;

5 different ways to create .CSV file of a SAS dataset.

refer http://studysas.blogspot.com/2009/02/how-to-create-comma-separated-file-csv.html

The answers here have addressed many ways to create a csv, but not necessarily the efficiency of these methods. I create a sample dataset ( temp ) with 100,000 rows and five columns to test and compare the methods for export the sas dataset to a csv named temp.csv .


The first method: The Export Procedure . Run time: 0.43 seconds.

PROC EXPORT data=temp 
     outfile="temp.csv" dbms=csv replace;

Quick run time and simple, in-the-box flexibility when choosing other options. With that said, it is not the strongest for customization


The second method: ODS with Print Procedure . Run time: 14.09 seconds.

ODS csv file="temp.csv";
PROC PRINT data=temp noobs;
RUN;
ods csv close;

This method is the worst option out of the three for most use cases, although there a few special use cases. It is nice for temporary output of previously written procedures, especially if you want the output to stay in the lst file (if it is not too big). It also may be useful when you want to convert another procedure (for instance, a complicated tabulate ) to file without further manipulation. If you do not need the print out in the lst file, close your listing ( ods listing close ) or this will take much, much longer.


The third method: File Statement . Run time: 0.06 seconds.

DATA _null_;
    FILE "temp.csv ";
    SET temp;
    put (_all_) (',');
RUN;

While the performance of this method is not bad, it is not intuitive and looks confusing. As mentioned above however, you would have more control over the output, and it has the quickest run time of them all.

Modified data _NULL_ approach

Here's a modification to the data _NULL_ approach that writes out header rows to the file in the first datastep and then continues to write the same file with the data rows in the second datastep.

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("&dataset.","i");
   num=attrn(dsid,"nvars");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      put header @;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class,~/temp4.csv)

It can be made to work with keep and drop data set options (surprisingly), although unfortunately when a dataset is 'opened' using open() , the nvars attribute respects the new number of variables after keep or drop , but the varlabel and varname functions still look for variables using their varnum .

In the following example, just Name (varnum=1) and Height (varnum=4) are extracted from SASHELP.CLASS , because only two variables are kept, nvars is 2, but if we iterate over a loop with num as the upper bound, we miss out Height , because its varnum is 4:

 62         data _NULL_;
 63           dsid = open("sashelp.class (keep=name height)","i");
 64           num = attrn(dsid,"nvars");
 65           do i=1 to 5;
 66             vname = varname(dsid,i);
 67             put i= vname= num=;
 68           end;
 69         run;

 i=1 vname=Name num=2
 NOTE: Argument 2 to function VARNAME(1,2) at line 66 column 13 is invalid.
 i=2 vname=  num=2
 NOTE: Argument 2 to function VARNAME(1,3) at line 66 column 13 is invalid.
 i=3 vname=  num=2
 i=4 vname=Height num=2
 NOTE: Argument 2 to function VARNAME(1,5) at line 66 column 13 is invalid.
 i=5 vname=  num=2
 dsid=1 num=2 i=6 vname=  _ERROR_=1 _N_=1

There are two options:

  • Extract the true nvars value by stripping off the dataset options for the first pass
  • Use a really high number instead of num , although this just leads to lots of 'invalid' notes in the log

This is the first approach, requiring two calls to the open function:

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("%SCAN(&dataset.,1,()","i");
   num=attrn(dsid,"nvars");
   rc=close(dsid);
   dsid=open("&dataset.","i");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      if _error_ = 0 then put header @;
      _error_ = 0;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class (keep=name height),~/temp4.csv)

After writing all of that out, it probably makes sense to go with PROC EXPORT in most cases, or explicitly list the variables if there aren't many. This just shows what can be done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM