简体   繁体   中英

Grouping SAS Date on Month

I currently have some data that is in a date format but the underlying information is actually still a SAS date number. Consequently when I come to count on this field I get a separate row for each of the SAS numbers and the information is not grouped on month as I want it to be.

The data I have looks like this;

data beforehave;
   input ID  $ Activity $ Origianl_Start_Date;
   datalines;
   12345 Activity1 Oct-13
   12345 Activity1 Oct-13
   12345 Activity1 Nov-16
   12345 Activity2 Nov-16
   12345 Activity2 Nov-16
   23145 Activity1 Sep-15
   23145 Activity2 Sep-15
   23145 Activity2 Sep-15
;
RUN;

However when it comes to count permutations on the 'Original_Start_Date' category I get this

data beforehave;
   input ID  $ Activity $ Origianl_Start_Date Count_of_Original_Start_Date;
   datalines;
   12345 Activity1 Oct-13 1
   12345 Activity1 Oct-13 1
   12345 Activity1 Nov-16 1
   12345 Activity2 Nov-16 1
   12345 Activity2 Nov-16 1
   23145 Activity1 Sep-15 1
   23145 Activity2 Sep-15 1
   23145 Activity2 Sep-15 1
;
RUN;

However what I want is this.

data beforehave;
   input ID  $ Activity $ Origianl_Start_Date Count_of_Original_Start_Date;
   datalines;
   12345 Activity1 Oct-13 2
   12345 Activity1 Nov-16 1
   12345 Activity2 Nov-16 2
   23145 Activity1 Sep-15 1
   23145 Activity2 Sep-15 2
;
RUN;

I had thought about taking this and turning it into a character format however it would be really useful to keep it as a date.

All I really want is to be able to group a SAS date number based upon the month.

As alluded to in my comment, here are 2 ways to achieve your goal. The easiest is proc summary as this automatically groups by the formatted values. The 2nd option is a data step with the groupformat option in the by statement, this requires a proc sort beforehand.

data have;
   input ID  $ Activity $10. Original_Start_Date :date7.;
   format Original_Start_Date monyy5.;
   datalines;
   12345 Activity1 01Oct13
   12345 Activity1 02Oct13
   12345 Activity1 03Nov16
   12345 Activity2 04Nov16
   12345 Activity2 05Nov16
   23145 Activity1 06Sep15
   23145 Activity2 07Sep15
   23145 Activity2 08Sep15
;
RUN;

/* method 1 */
proc summary data=have nway;
class id activity original_start_date;
output out=want1 (drop=_type_ rename=(_freq_=Count_of_Original_Start_Date));
run;

/* method 2 */
proc sort data=have;
by id activity original_start_date;
run;

data want2;
set have;
by id activity original_start_date groupformat;
if first.original_start_date then Count_of_Original_Start_Date=0;
Count_of_Original_Start_Date+1;
if last.original_start_date then output;
run;

I prefer using proc sql for this:

data have;
  input ID  $ Activity $10. Original_Start_Date :date7.;
  format Original_Start_Date monyy5.;
  datalines;
  12345 Activity1 01Oct13
  12345 Activity1 02Oct13
  12345 Activity1 03Nov16
  12345 Activity2 04Nov16
  12345 Activity2 05Nov16
  23145 Activity1 06Sep15
  23145 Activity2 07Sep15
  23145 Activity2 08Sep15
;
Run;

proc sql;
    create table want as
    select ID,Activity,Original_Start_Date,count(*) as Count_of_Original_Start_Date
    from have
    group by 1,2,3;
quit;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM