简体   繁体   中英

SAS: PROC FREQ combinations automatically?

I have a patient dataset that looks like the below table and I would like to see which diseases run together and ultimately make a heatmap . I used PROC FREQ to make this list table, but it is too laborious to go through like this because it gives me every combination (thousands).

Moya    Hypothyroid Hyperthyroid    Celiac
   1       1           0             0
   1       1           0             0       
   0       0           1             1
   0       0           0             0
   1       1           0             0
   1       0           1             0
   1       1           0             0
   1       1           0             0
   0       0           1             1
   0       0           1             1


proc freq data=new;
tables HOHT*HOGD*CroD*Psor*Viti*CelD*UlcC*AddD*SluE*Rhea*PerA/list;
run;

I would ultimately like a bunch of cross tabs as I show below, so I can see how many patients have each combination. Obviously it's possible to copy paste each variable like this manually, but is there any way to see this quickly or automate this?

proc freq data=new;
tables HOHT*HOGD/list;
run;

proc freq data=new;
tables HOHT*CroD/list;
run;


proc freq data=new;
tables HOHT*Psor/list;
run;

Thanks!

One can control the tables generated in PROC FREQ with the TABLES statement. To generate tables that are 2-way contingency tables of all pairs of columns in a data set, one can write a SAS macro that loops through a list of variables, and generates TABLES statements to create all of the correct contingency tables.

For example, using the data from the original post:

data xtabs;
input Moya    Hypothyroid Hyperthyroid    Celiac;
datalines;
   1       1           0             0
   1       1           0             0       
   0       0           1             1
   0       0           0             0
   1       1           0             0
   1       0           1             0
   1       1           0             0
   1       1           0             0
   0       0           1             1
   0       0           1             1
;
run;
%macro gentabs(varlist=);
   %let word_count = %sysfunc(countw(&varlist));
   %do i = 1 %to (&word_count - 1);
      tables %scan(&varlist,&i,%str( )) * (
      %do j = %eval(&i + 1) %to &word_count;
        %scan(&varlist,&j,%str( ))
      %end; )
      ; /* end tables statement */
   %end;
%mend;
options mprint;
proc freq data = xtabs;
  %gentabs(varlist=Moya Hypothyroid Hyperthyroid Celiac)
  run;

The code generated by the SAS macro is:

 73         proc freq data = xtabs;
 74           %gentabs(varlist=Moya Hypothyroid Hyperthyroid Celiac)
 MPRINT(GENTABS):   tables Moya * ( Hypothyroid Hyperthyroid Celiac ) ;
 MPRINT(GENTABS):   tables Hypothyroid * ( Hyperthyroid Celiac ) ;
 MPRINT(GENTABS):   tables Hyperthyroid * ( Celiac ) ;
 75         run;

...and the first few tables from the resulting output looks like:

在此输入图像描述

To add options to the TABLES statement, one would add code before the semicolon on the line commented as /* end tables statement */ .

Proc MEANS is one common tool for obtaining a variety of statistics for a combinatoric group with in the data. In your case you want only the count of each combination.

Suppose you had 10,000 patients with 10 binary factors

data patient_factors;
  do patient_id = 1 to 10000;
    array factor(10);
    do _n_ = 1 to dim(factor);
      factor(_n_) = ranuni(123) < _n_/(dim(factor)+3);
    end;
    output;
  end;
  format factor: 4.;
run;

As you mentioned, Proc FREQ can compute the counts of each 10-level combination.

proc freq noprint data=patient_factors;
  table 
    factor1
    * factor2 
      * factor3
        * factor4
          * factor5
            * factor6
              * factor7
                * factor8
                  * factor9
                    * factor10
  / out = pf_10deep
  ;
run;

FREQ does not have syntax to support creating output data that contains each pairwise combination involving factor1 .

Proc MEANS does have the syntax for such output.

proc means noprint data=patient_factors;
  class factor1-factor10;
  output out=counts_paired_with_factor1 n=n;
  types factor1 * ( factor2 - factor10 );
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM