简体   繁体   English

SAS:如何使用RETAIN语句在DATA步骤中创建求和变量,相当于PROC PRINT中的SUM语句输出

[英]SAS: How to use RETAIN statement to create a summed variable in the DATA step, equivalent to the SUM statement output in PROC PRINT

In SAS, I'm trying to create a variable that is the sum of another. 在SAS中,我正在尝试创建一个另一个变量的变量。 In this case, I am trying to create two variables: Total_All_Ages , which is the sum of the 2013 US population POPESTIMATE2013 , and Total_18Plus , which is the sum of the 2013 US population aged 18+ POPEST18PLUS2013 . 在这种情况下,我试图创建两个变量: Total_All_Ages ,它是2013年美国人口POPESTIMATE2013的总和,以及Total_18Plus ,它是2013年美国18岁以上人口POPEST18PLUS2013

I want the output of these variables to appear as though I had used the sum statement under proc print (where the sum appears at the bottom of the variable column in a new row). 我希望这些变量的输出看起来好像我在proc print下使用了sum语句(其中sum总和出现在新行的变量列的底部)。 However, I do not want to use the print procedure. 但是,我不想使用print程序。 Instead, I want to create my output only using the data step. 相反,我想仅使用data步骤创建输出。

The way I need to do this is with the retain (and input ) statement. 我需要这样做的方法是使用retain (和input )语句。

My code is as follows: 我的代码如下:

data _NULL_;
retain Total_All_Ages Total_18Plus;
infile RAWfoldr DLM=',' firstobs=3 obs=53;
informat STATE $2. NAME $20.;
input SUMLEV REGION $ DIVISION STATE $ NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
    Total_All_Ages = sum(Total_All_Ages, POPESTIMATE2013);
    Total_18Plus = sum(Total_18Plus, POPEST18PLUS2013);
keep STATE NAME POPESTIMATE2013 POPEST18PLUS2013 Total_All_Ages Total_18Plus;
format POPESTIMATE2013 comma11. POPEST18PLUS2013 comma11.;
file print notitles;
if _n_=1 then put '=== U.S. Resident Population Estimates for All Ages and ===
                  Ages 18 or Older by State (in Alphabetical Order), 2013';
if _n_=1 then put ' ';
if _n_=1 then put @5 'FIPS Code' @16 'State Name' @40 'All Ages' @55 'Ages 18 or Older';
if _n_=1 then put ' ';
put @5 STATE @16 NAME @40 POPESTIMATE2013 @55 POPEST18PLUS2013;
run;

You can see that in my input statement, I create the two variables that I mentioned. 您可以在我的input语句中看到,我创建了我提到的两个变量。 I also mention them in my retain statement. 我也在retain声明中提到它们。 However, I'm not sure how to make them appear in my output in the way I specified. 但是,我不知道如何以我指定的方式使它们出现在我的输出中。

I want them to appear as a Total line at the bottom of the output, like this: 我希望它们在输出的底部显示为Total行,如下所示:

                                                        POPESTIMATE2013  POPEST18PLUS2013
                                                        112312234        1234123412341234
                                                        23413412341234   213412341234



                       ============                      ============     ============
                       Total                             23423423429      242234545345 

Is there a way to put these variables on a new line at the very bottom of the output (sort of like how I put the variable labels using the if _n_=1 code)? 有没有办法将这些变量放在输出最底部的新行上(有点像我如何使用if _n_=1代码放置变量标签)?

Let me know if I need to explain myself better. 如果我需要更好地解释自己,请告诉我。 I appreciate any help with this. 我很感激任何帮助。 Thank you. 谢谢。

If I understand your question, you're almost there. 如果我理解你的问题,你几乎就在那里。

First, add end=eof to your infile statement. 首先,将end=eof添加到infile语句中。 This initializes a variable "eof" that is equal to 0, but will equal 1 only when SAS is reading in the last line of data. 这会初始化一个等于0的变量“eof”,但只有当SAS读取最后一行数据时才会等于1。 This works in a set statement as well. 这也适用于set语句。

Next, add this do block, which will execute when sas is on the last line of the file: 接下来,添加此do块,当sas位于文件的最后一行时执行:

  if eof then do;
    put @5 9*'=' @40 11*'=' @55 11*'=';
    put @5 'Total' @40 Total_All_Ages comma11. @55 Total_18Plus comma11.;
  end;

Here, you use put statements to print out the formatting (repeated ='s signs) and the totals. 在这里,您使用put语句打印格式(重复='标志)和总计。 Complete code is below: 完整代码如下:

data _NULL_;
  retain Total_All_Ages Total_18Plus;
  infile RAWfoldr DLM=',' firstobs=3 obs=53 end=eof;
  informat STATE $2. NAME $20.;
  input SUMLEV REGION $ DIVISION STATE $ NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
    Total_All_Ages = sum(Total_All_Ages, POPESTIMATE2013);
    Total_18Plus = sum(Total_18Plus, POPEST18PLUS2013);
  keep STATE NAME POPESTIMATE2013 POPEST18PLUS2013 Total_All_Ages Total_18Plus;
  format POPESTIMATE2013 comma11. POPEST18PLUS2013 comma11.;
  file print notitles;
  if _n_=1 then put '=== U.S. Resident Population Estimates for All Ages and ===
                    Ages 18 or Older by State (in Alphabetical Order), 2013';
  if _n_=1 then put ' ';
  if _n_=1 then put @5 'FIPS Code' @16 'State Name' @40 'All Ages' @55 'Ages 18 or Older';
  if _n_=1 then put ' ';
  put @5 STATE @16 NAME @40 POPESTIMATE2013 comma11. @55 POPEST18PLUS2013 comma11.;
  if eof then do;
    put @5 9*'=' @40 11*'=' @55 11*'=';
    put @5 'Total' @40 Total_All_Ages comma11. @55 Total_18Plus comma11.;
  end;
run;

One final note on your code: you can right-align your numbers by specifying a format followed by "-r" in your put statement, eg: 关于代码的最后一点注意事项:您可以通过在put语句中指定格式后跟“-r”来右对齐数字,例如:

  put @5 STATE @16 NAME @40 POPESTIMATE2013 comma11.-r @55 POPEST18PLUS2013 comma11.-r;

This will override any format statement you have. 这将覆盖您拥有的任何格式语句。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM