I am trying to summarize my data set using the proc sql, but I have repeated values in the output, a simple version of my code is:
PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT patid,ndc,fill_mon,
COUNT(dea) AS n_dea,
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx
GROUP BY patid,ndc,fill_mon;
QUIT;
Some sample output is:
Obs Patid Ndc FILL_mon n_dea DEDUCT
3815 33003605204 00054465029 2000-05 2 0
3816 33003605204 00054465029 2000-05 2 0
12257 33004361450 00406035701 2000-06 2 0
16564 33004744098 00603128458 2000-05 2 0
16565 33004744098 00603128458 2000-05 2 0
16566 33004744098 00603128458 2000-06 2 0
16567 33004744098 00603128458 2000-06 2 0
46380 33008165116 00406035705 2000-06 2 0
85179 33013674758 00406035801 2000-05 2 0
89248 33014228307 00054465029 2000-05 2 0
107514 33016949900 00406035805 2000-06 2 0
135047 33056226897 63481062370 2000-05 2 0
213691 33065594501 00472141916 2000-05 2 0
215192 33065657835 63481062370 2000-06 2 0
242848 33066899581 60432024516 2000-06 2 0
As you can see there are repeated out put, for example obs 3815,3816. I have saw some people had similar problem, but the answers didn't work for me.
The content of the dataset is this:
The SAS System 5
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 210
First Data Page 1
Max Obs per Page 1360
Obs in First Data Page 1310
Number of Data Set Repairs 0
Filename /home/zahram/optum/rx_4.sas7bdat
Release Created 9.0401M2
Host Created Linux
Inode Number 424673574
Access Permission rw-r-----
Owner Name zahram
File Size (bytes) 13828096
The SAS System 6
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
3 FILL_mon Num 8 YYMMD. Fill month
2 Ndc Char 11 $11. $20. Ndc
1 Patid Num 8 19. Patid
4 n_dea Num 8
5 tot_DEDUCT Num 8
Sort Information
Sortedby Patid Ndc FILL_mon
Validated YES
Character Set ASCII
The SAS System 7
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Sort Information
Sort Option NODUPKEY
NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.08 seconds cpu time 0.01 seconds
I'll guess that you have a format on a variable, most likely the date. Proc SQL does not aggregate over formatted values but will use the underlying values but still shows them as formatted, so they appear as duplicates. Your proc contents confirms this. You can get around this by converting this the variable to a character variable.
PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT patid,ndc, put(fill_mon, yymmd.) as fill_month,
COUNT(dea) AS n_dea,
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx
GROUP BY patid,ndc, calculated fill_month;
QUIT;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.