I currently have some data that is in a date format but the underlying information is actually still a SAS date number. Consequently when I come to count on this field I get a separate row for each of the SAS numbers and the information is not grouped on month as I want it to be.
The data I have looks like this;
data beforehave;
input ID $ Activity $ Origianl_Start_Date;
datalines;
12345 Activity1 Oct-13
12345 Activity1 Oct-13
12345 Activity1 Nov-16
12345 Activity2 Nov-16
12345 Activity2 Nov-16
23145 Activity1 Sep-15
23145 Activity2 Sep-15
23145 Activity2 Sep-15
;
RUN;
However when it comes to count permutations on the 'Original_Start_Date' category I get this
data beforehave;
input ID $ Activity $ Origianl_Start_Date Count_of_Original_Start_Date;
datalines;
12345 Activity1 Oct-13 1
12345 Activity1 Oct-13 1
12345 Activity1 Nov-16 1
12345 Activity2 Nov-16 1
12345 Activity2 Nov-16 1
23145 Activity1 Sep-15 1
23145 Activity2 Sep-15 1
23145 Activity2 Sep-15 1
;
RUN;
However what I want is this.
data beforehave;
input ID $ Activity $ Origianl_Start_Date Count_of_Original_Start_Date;
datalines;
12345 Activity1 Oct-13 2
12345 Activity1 Nov-16 1
12345 Activity2 Nov-16 2
23145 Activity1 Sep-15 1
23145 Activity2 Sep-15 2
;
RUN;
I had thought about taking this and turning it into a character format however it would be really useful to keep it as a date.
All I really want is to be able to group a SAS date number based upon the month.
As alluded to in my comment, here are 2 ways to achieve your goal. The easiest is proc summary
as this automatically groups by the formatted values. The 2nd option is a data step with the groupformat
option in the by
statement, this requires a proc sort
beforehand.
data have;
input ID $ Activity $10. Original_Start_Date :date7.;
format Original_Start_Date monyy5.;
datalines;
12345 Activity1 01Oct13
12345 Activity1 02Oct13
12345 Activity1 03Nov16
12345 Activity2 04Nov16
12345 Activity2 05Nov16
23145 Activity1 06Sep15
23145 Activity2 07Sep15
23145 Activity2 08Sep15
;
RUN;
/* method 1 */
proc summary data=have nway;
class id activity original_start_date;
output out=want1 (drop=_type_ rename=(_freq_=Count_of_Original_Start_Date));
run;
/* method 2 */
proc sort data=have;
by id activity original_start_date;
run;
data want2;
set have;
by id activity original_start_date groupformat;
if first.original_start_date then Count_of_Original_Start_Date=0;
Count_of_Original_Start_Date+1;
if last.original_start_date then output;
run;
I prefer using proc sql for this:
data have;
input ID $ Activity $10. Original_Start_Date :date7.;
format Original_Start_Date monyy5.;
datalines;
12345 Activity1 01Oct13
12345 Activity1 02Oct13
12345 Activity1 03Nov16
12345 Activity2 04Nov16
12345 Activity2 05Nov16
23145 Activity1 06Sep15
23145 Activity2 07Sep15
23145 Activity2 08Sep15
;
Run;
proc sql;
create table want as
select ID,Activity,Original_Start_Date,count(*) as Count_of_Original_Start_Date
from have
group by 1,2,3;
quit;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.