简体   繁体   中英

How can I “define” SAS data sets using macro variable and write to them using an array

My source data contains 200,000+ observations, one of the many variables in the data set is "county." My goal is to write a macro that will take this one data set as an input, and split them into 58 different temporary data sets for each of the California counties.

First question is if it is possible to specify the 58 counties on the data statement using something like a global reference array defined beforehand.

Second question is, assuming the output data sets have been properly specified on the data statement, is it possible to use a do loop to choose the right data set to write to?

I can get the comparison to work properly, but cannot seem to use a array reference to specify a output data set. This is most likely because I need more experience with the macro environment!

Please see below for the simplistic skeleton framework I have written so far. c_long array contains the names of each of the counties, c_short array contains a 3 letter abbreviation for each of the counties. Thanks in advance!

data splitraw;
    length county_name $15;
    infile "&path/random.csv" dsd firstobs=2;
    input county_name $ number;
run;

%macro _58countysplit(dxtosplit,countycol);
data <need to specify 58 data sets here named something like &dxtosplit_ALA, &dxtosplit_ALP, etc..>;
    set &dxtosplit;
    do i=1 to 58;
        if c_long{i}=&countycol then output &dxtosplit._&c_short{i};
    end;
run;
%mend _58countysplit;

%_58countysplit(splitraw,county_name);

The code you provided will need to run through the large dataset 58 times, each time writing a small one. I have done it a bit different. First I create a sample dataset with a variable "county" this will contain ten different values:

data large;
  attrib county length=$12;
  do i=1 to 10000;
    county=put(mod(i,10)+1,ROMAN.);
    output;
  end;
run;

First, I start with finding all the unique values and constructing the names of all the different tables I would like to create:

proc sql noprint;
  select distinct compbl("large_"!!county) into :counties separated by " "
  from large;
quit;

Now I have a macrovariable "counties" that containes all the different datasets I want to create.

Here I am writing the IF-statements to a file:

filename x temp;
data _null_;
  attrib county length=$12 ds length=$18;
  file x;
  i=1;
  do while(scan("&counties",i," ") ne "");
    ds=scan("&counties",i," ");
    county=scan(ds,-1,"_");
    put "if county=""" county +(-1) """ then output " ds ";";
    i+1;
  end;
run;

Now I have what I need to create the small datasets:

data &counties;
  set large;
  %inc x;
run;

I agree with user667489, there is almost always a better way then splitting one large data set into many small data sets. However, if you want to proceed along these lines there is a table in sashelp called vcolumn which lists all your libraries, their tables, and each column (in each table) that should help you. Also if you want

if c_long{i}=&countycol then output &dxtosplit._&c_short{i};

to resolve you might mean:

if c_long{i}=&countycol then output &&dxtosplit._&c_short{i};

It's likely, depending upon what you're actually trying to do, that BY processing is all you need. Nevertheless, here is a simple solution:

    %macro split_by(data=, splitvar=);
        %local dslist iflist;


        proc sql noprint;   
            select distinct cats("&splitvar._", &splitvar) 
            into :dslist separated by ' ' 
            from &data;

            select distinct 
            catt("if &splitvar='", &splitvar, "' then output &splitvar._", &splitvar, ";", '0A'x) 
            into :iflist separated by "else "
            from &data;
        quit;

        data &dslist;
            set &data;
            &iflist
        run;        
    %mend split_by;

Here is some test data to illustrate:

options mprint;

data test;
    length county $1 val $1;
    input county val;
    infile cards;
    datalines;
A 2
B 3
A 5
C 8
C 9
D 10
run;

%split_by(data=test, splitvar=county)

And you can view the log to see how the macro generates the DATA step you want:

 MPRINT(SPLIT_BY):   proc sql noprint;
 MPRINT(SPLIT_BY):   select distinct cats("county_", county) into :dslist separated by ' ' from test;
 MPRINT(SPLIT_BY):   select distinct catt("if county='", county, "' then output county_", county, ";", '0A'x) into :iflist separated 
 by "else " from test;
 MPRINT(SPLIT_BY):   quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       cpu time            0.01 seconds


 MPRINT(SPLIT_BY):   data county_A county_B county_C county_D;
 MPRINT(SPLIT_BY):   set test;
 MPRINT(SPLIT_BY):   if county='A' then output county_A;
 MPRINT(SPLIT_BY):   else if county='B' then output county_B;
 MPRINT(SPLIT_BY):   else if county='C' then output county_C;
 MPRINT(SPLIT_BY):   else if county='D' then output county_D;
 MPRINT(SPLIT_BY):   run;

 NOTE: There were 6 observations read from the data set WORK.TEST.
 NOTE: The data set WORK.COUNTY_A has 2 observations and 2 variables.
 NOTE: The data set WORK.COUNTY_B has 1 observations and 2 variables.
 NOTE: The data set WORK.COUNTY_C has 2 observations and 2 variables.
 NOTE: The data set WORK.COUNTY_D has 1 observations and 2 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.03 seconds
       cpu time            0.05 seconds

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM