简体   繁体   English

在Proc SQL for SAS中使用分组依据

[英]Using group by in Proc SQL for SAS

I am trying to summarize my data set using the proc sql, but I have repeated values in the output, a simple version of my code is: 我正在尝试使用proc sql总结我的数据集,但是我在输出中有重复的值,我的代码的简单版本是:

PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT  patid,ndc,fill_mon,
COUNT(dea) AS n_dea, 
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx 
GROUP BY patid,ndc,fill_mon;
QUIT;

Some sample output is: 一些示例输出是:

 Obs                  Patid    Ndc            FILL_mon    n_dea    DEDUCT

 3815            33003605204    00054465029    2000-05       2         0  
3816            33003605204    00054465029    2000-05       2         0  
12257            33004361450    00406035701    2000-06       2         0  
16564            33004744098    00603128458    2000-05       2         0  
16565            33004744098    00603128458    2000-05       2         0  
16566            33004744098    00603128458    2000-06       2         0  
16567            33004744098    00603128458    2000-06       2         0  
46380            33008165116    00406035705    2000-06       2         0  
85179            33013674758    00406035801    2000-05       2         0  
89248            33014228307    00054465029    2000-05       2         0  
107514            33016949900    00406035805    2000-06       2         0  
135047            33056226897    63481062370    2000-05       2         0  
213691            33065594501    00472141916    2000-05       2         0  
215192            33065657835    63481062370    2000-06       2         0  
242848            33066899581    60432024516    2000-06       2         0  

As you can see there are repeated out put, for example obs 3815,3816. 如您所见,有重复的输出,例如obs 3815,3816。 I have saw some people had similar problem, but the answers didn't work for me. 我已经看到有些人有类似的问题,但是答案对我没有用。

The content of the dataset is this: 数据集的内容是这样的:

                            The SAS System                               5
                                          17:01 Thursday, December 3, 2015

                        The CONTENTS Procedure

                  Engine/Host Dependent Information

     Data Set Page Size          65536                           
     Number of Data Set Pages    210                             
     First Data Page             1                               
     Max Obs per Page            1360                            
     Obs in First Data Page      1310                            
     Number of Data Set Repairs  0                               
     Filename                    /home/zahram/optum/rx_4.sas7bdat
     Release Created             9.0401M2                        
     Host Created                Linux                           
     Inode Number                424673574                       
     Access Permission           rw-r-----                       
     Owner Name                  zahram                          
     File Size (bytes)           13828096                        


                            The SAS System                               6
                                          17:01 Thursday, December 3, 2015

                        The CONTENTS Procedure

              Alphabetic List of Variables and Attributes

  #    Variable      Type    Len    Format    Informat    Label

  3    FILL_mon      Num       8    YYMMD.                Fill month
  2    Ndc           Char     11    $11.      $20.        Ndc       
  1    Patid         Num       8    19.                   Patid     
  4    n_dea         Num       8                                    
  5    tot_DEDUCT    Num       8                                    


                          Sort Information

                  Sortedby       Patid Ndc FILL_mon
                  Validated      YES               
                  Character Set  ASCII             


                            The SAS System                               7
                                          17:01 Thursday, December 3, 2015

                        The CONTENTS Procedure

                          Sort Information

                  Sort Option    NODUPKEY          

NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.08 seconds cpu time 0.01 seconds 注意:使用的程序内容(总处理时间):实时0.08秒cpu时间0.01秒

I'll guess that you have a format on a variable, most likely the date. 我猜您在变量上有一种格式,很可能是日期。 Proc SQL does not aggregate over formatted values but will use the underlying values but still shows them as formatted, so they appear as duplicates. Proc SQL不会聚合格式化值,但会使用基础值,但仍将其显示为格式化后的值,因此它们显示为重复项。 Your proc contents confirms this. 您的proc内容确认了这一点。 You can get around this by converting this the variable to a character variable. 您可以通过将该变量转换为字符变量来解决此问题。

PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT  patid,ndc, put(fill_mon, yymmd.) as fill_month, 
COUNT(dea) AS n_dea, 
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx 
GROUP BY patid,ndc, calculated fill_month;
QUIT;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM