[英]Using group by in Proc SQL for SAS
I am trying to summarize my data set using the proc sql, but I have repeated values in the output, a simple version of my code is: 我正在尝试使用proc sql总结我的数据集,但是我在输出中有重复的值,我的代码的简单版本是:
PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT patid,ndc,fill_mon,
COUNT(dea) AS n_dea,
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx
GROUP BY patid,ndc,fill_mon;
QUIT;
Some sample output is: 一些示例输出是:
Obs Patid Ndc FILL_mon n_dea DEDUCT
3815 33003605204 00054465029 2000-05 2 0
3816 33003605204 00054465029 2000-05 2 0
12257 33004361450 00406035701 2000-06 2 0
16564 33004744098 00603128458 2000-05 2 0
16565 33004744098 00603128458 2000-05 2 0
16566 33004744098 00603128458 2000-06 2 0
16567 33004744098 00603128458 2000-06 2 0
46380 33008165116 00406035705 2000-06 2 0
85179 33013674758 00406035801 2000-05 2 0
89248 33014228307 00054465029 2000-05 2 0
107514 33016949900 00406035805 2000-06 2 0
135047 33056226897 63481062370 2000-05 2 0
213691 33065594501 00472141916 2000-05 2 0
215192 33065657835 63481062370 2000-06 2 0
242848 33066899581 60432024516 2000-06 2 0
As you can see there are repeated out put, for example obs 3815,3816. 如您所见,有重复的输出,例如obs 3815,3816。 I have saw some people had similar problem, but the answers didn't work for me. 我已经看到有些人有类似的问题,但是答案对我没有用。
The content of the dataset is this: 数据集的内容是这样的:
The SAS System 5
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 210
First Data Page 1
Max Obs per Page 1360
Obs in First Data Page 1310
Number of Data Set Repairs 0
Filename /home/zahram/optum/rx_4.sas7bdat
Release Created 9.0401M2
Host Created Linux
Inode Number 424673574
Access Permission rw-r-----
Owner Name zahram
File Size (bytes) 13828096
The SAS System 6
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
3 FILL_mon Num 8 YYMMD. Fill month
2 Ndc Char 11 $11. $20. Ndc
1 Patid Num 8 19. Patid
4 n_dea Num 8
5 tot_DEDUCT Num 8
Sort Information
Sortedby Patid Ndc FILL_mon
Validated YES
Character Set ASCII
The SAS System 7
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Sort Information
Sort Option NODUPKEY
NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.08 seconds cpu time 0.01 seconds 注意:使用的程序内容(总处理时间):实时0.08秒cpu时间0.01秒
I'll guess that you have a format on a variable, most likely the date. 我猜您在变量上有一种格式,很可能是日期。 Proc SQL does not aggregate over formatted values but will use the underlying values but still shows them as formatted, so they appear as duplicates. Proc SQL不会聚合格式化值,但会使用基础值,但仍将其显示为格式化后的值,因此它们显示为重复项。 Your proc contents confirms this. 您的proc内容确认了这一点。 You can get around this by converting this the variable to a character variable. 您可以通过将该变量转换为字符变量来解决此问题。
PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT patid,ndc, put(fill_mon, yymmd.) as fill_month,
COUNT(dea) AS n_dea,
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx
GROUP BY patid,ndc, calculated fill_month;
QUIT;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.