简体   繁体   中英

Crosstable displaying frequency combination of N variables in SAS

What I've got:

  • a table of 20 rows in SAS (originally 100k)
  • various binary attributes (columns)

What I'm looking to get:

  • A crosstable displaying the frequency of the attribute combinations

like this:

          Attribute1    Attribute2  Attribute3  Attribute4
Attribute1    5              0          1            2
Attribute2    0              3          0            3
Attribute3    2              0          5            4
Attribute4    1              2          0            10

*The actual sum of combinations is made up and probably not 100% logical

The code I currently have:

    /*create dummy data*/

    data monthly_sales (drop=i);
        do i=1 to 20;
            Attribute1=rand("Normal")>0.5;
            Attribute2=rand("Normal")>0.5;
            Attribute3=rand("Normal")>0.5;
            Attribute4=rand("Normal")>0.5;
            output;
        end;
    run;

I guess this can be done smarter, but this seem to work. First I created a table that should hold all the frequencies:

data crosstable;
  Attribute1=.;Attribute2=.;Attribute3=.;Attribute4=.;output;output;output;output;
run;

Then I loop through all the combinations, inserting the count into the crosstable:

%macro lup();
%do i=1 %to 4;
  %do j=&i %to 4;
    proc sql noprint;
      select count(*) into :Antall&i&j
      from monthly_sales (where=(Attribute&i and Attribute&j));
    quit;
    data crosstable;
      set crosstable;
      if _n_=&j then Attribute&i=&&Antall&i&j;
      if _n_=&i then Attribute&j=&&Antall&i&j;
    run;
  %end;
%end;
%mend;
%lup;

Note that since the frequency count for (i,j)=(j,i) you do not need to do both.

I'd recommend using the built-in SAS tools for this sort of thing, and probably displaying your data slightly differently as well, unless you really want a diagonal table. eg

   data monthly_sales (drop=i);
        do i=1 to 20;
            Attribute1=rand("Normal")>0.5;
            Attribute2=rand("Normal")>0.5;
            Attribute3=rand("Normal")>0.5;
            Attribute4=rand("Normal")>0.5;
            count = 1;
            output;
        end;
    run;

proc freq data = monthly_sales noprint;
    table  attribute1 * attribute2 * attribute3 * attribute4 / out = frequency_table;
run;

proc summary nway data = monthly_sales;
    class attribute1 attribute2 attribute3 attribute4;
    var count;
    output out = summary_table(drop = _TYPE_ _FREQ_) sum(COUNT)= ;
run;

Either of these gives you a table with 1 row for each contribution of attributes in your data, which is slightly different from what you requested, but conveys the same information. You can force proc summary to include rows for combinations of class variables that don't exist in your data by using the completetypes option in the proc summary statement.

It's definitely worth taking the time to get familiar with proc summary if you're doing statistical analysis in SAS - you can include additional output statistics and process multiple variables with minimal additional code and processing overhead.

Update: it's possible to produce the desired table without resorting to macro logic, albeit a rather complex process:

proc summary data = monthly_sales completetypes;
    ways 1 2; /*Calculate only 1 and 2-way summaries*/
    class attribute1 attribute2 attribute3 attribute4;
    var count;
    output out = summary_table(drop = _TYPE_ _FREQ_) sum(COUNT)= ;
run;

/*Eliminate unnecessary output rows*/
data summary_table;
    set summary_table;
    array a{*} attribute:;
    sum = sum(of a[*]);
    missing = 0;
    do i = 1 to dim(a);
        missing + missing(a[i]);
        a[i] = a[i] * count;
    end;
    /*We want rows where two attributes are both 1 (sum = 2),
        or one attribute is 1 and the others are all missing*/
    if sum = 2 or (sum = 1 and missing = dim(a) - 1);
    drop i missing sum;
    edge = _n_;
run;

/*Transpose into long format - 1 row per combination of vars*/
proc transpose data = summary_table out = tr_table(where = (not(missing(col1))));
    by edge;
    var attribute:;
run;

/*Use cartesian join to produce table containing desired frequencies (still not in the right shape)*/
option linesize = 150;
proc sql noprint _method _tree;
    create table diagonal as
        select  a._name_ as aname, 
                        b._name_ as bname,
                        a.col1 as count
        from tr_table a, tr_table b
            where a.edge = b.edge
            group by a.edge
            having (count(a.edge) = 4 and aname ne bname) or count(a.edge) = 1
            order by aname, bname
            ;
quit;

/*Transpose the table into the right shape*/
proc transpose data = diagonal out = want(drop = _name_);
    by aname;
    id bname;
    var count;
run;

/*Re-order variables and set missing values to zero*/
data want;
    informat aname attribute1-attribute4;
    set want;
    array a{*} attribute:;
    do i = 1 to dim(a);
        a[i] = sum(a[i],0);
    end;
    drop i;
run;

Yeah, user667489 was right, I just added some extra code to get the cross-frequency table looking good. First, I created a table with 10 million rows and 10 variables:

data monthly_sales (drop=i);
        do i=1 to 10000000;
            Attribute1=rand("Normal")>0.5;
            Attribute2=rand("Normal")>0.5;
            Attribute3=rand("Normal")>0.5;
            Attribute4=rand("Normal")>0.5;
            Attribute5=rand("Normal")>0.5;
            Attribute6=rand("Normal")>0.5;
            Attribute7=rand("Normal")>0.5;
            Attribute8=rand("Normal")>0.5;
            Attribute9=rand("Normal")>0.5;
            Attribute10=rand("Normal")>0.5;
            output;
        end;
    run;

Create an empty 10x10 crosstable:

data crosstable;
  Attribute1=.;Attribute2=.;Attribute3=.;Attribute4=.;Attribute5=.;Attribute6=.;Attribute7=.;Attribute8=.;Attribute9=.;Attribute10=.;
  output;output;output;output;output;output;output;output;output;output;
run;

Create a frequency table using proc freq:

proc freq data = monthly_sales noprint;
    table  attribute1 * attribute2 * attribute3 * attribute4 * attribute5 * attribute6 * attribute7 * attribute8 * attribute9 * attribute10
            / out = frequency_table;
run;

Loop through all the combinations of Attributes and sum the "count" variable. Insert it into the crosstable:

%macro lup();
%do i=1 %to 10;
  %do j=&i %to 10;
    proc sql noprint;
      select sum(count) into :Antall&i&j
      from frequency_table (where=(Attribute&i and Attribute&j));
    quit;
    data crosstable;
      set crosstable;
      if _n_=&j then Attribute&i=&&Antall&i&j;
      if _n_=&i then Attribute&j=&&Antall&i&j;
    run;
  %end;
%end;
%mend;
%lup;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM