简体   繁体   English

SAS循环:基于条件垂直汇总观测值

[英]SAS Looping: Summing observations vertically based on conditionally

I have a dataset that looks like: 我有一个数据集,看起来像:

Zip Codes Total Cars 邮政编码总计汽车

  • 11111 3 11111 3
  • 11111 4 11111 4
  • 23232 1 23232 1
  • 44331 0 44331 0
  • 44331 10 44331 10
  • 18860 6 18860 6
  • 18860 6 18860 6
  • 18860 6 18860 6
  • 18860 8 18860 8

Ther are 3 million+ rows just like this, with varying zips. 像这样,有300万以上的行,并且拉链不同。 I need to sum total cars for each zip code, such that the resulting table looks like 我需要将每个邮政编码的汽车总数相加,以使结果表看起来像

Zip Codes Total Cars 邮政编码总计汽车

  • 11111 7 11111 7
  • 23232 1 23232 1
  • 44331 10 44331 10
  • 18860 26 . 18860 26 . .

Manually inputting zips into the code is not an option considering the size of the dataset. 考虑到数据集的大小,不能手动将zip输入到代码中。 Thoughts? 思考?

Both answers so far are OK, but here is a more detailed explanation of both possible methods: 到目前为止,两个答案都可以,但是这里是对两种可能方法的更详细说明:

PROC SQL METHOD PROC SQL方法

PROC SQL;
  CREATE TABLE output_table AS
  SELECT ZipCodes,
  SUM(Total_Cars) as Total_Cars
  FROM input_table
  GROUP BY ZipCodes;
QUIT;

The GROUP BY clause can also be written GROUP BY 1 , omitting ZipCodes , as this refers to the 1st column in the SELECT clause. GROUP BY子句也可以写成GROUP BY 1 ,省略ZipCodes ,因为它引用SELECT子句中的第一列。

PROC SUMMARY METHOD 过程总结方法

PROC SUMMARY DATA=input_table NWAY;
             CLASS ZipCodes;
             VAR Total_Cars;
             OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;

The method is similar to another answer to this question, but I've added: 该方法类似于该问题的另一个答案,但是我添加了:

  • NWAY - gives only the maximum level of summarisation, here it's not as important because you have only one CLASS variable, meaning there is only one level of summarisation. NWAY仅给出最高级别的汇总,在此并不重要,因为您只有一个CLASS变量,这意味着只有一个级别的汇总。 However, without NWAY you get an extra row showing the total value of Total_Cars across the whole dataset, which is not something you asked for in your question. 但是,如果没有NWAY您将获得额外的一行来显示整个数据集中的Total_Cars ,这不是您在问题中要的。

  • DROP=_TYPE_ _FREQ_ - This removes the automatic variables: DROP=_TYPE_ _FREQ_这将删除自动变量:

    • _TYPE_ - which shows the level of summarisation (see comment above), which would just be a column containing the value 1 . _TYPE_显示_TYPE_级别(请参见上面的注释),它只是包含值1的列。
    • _FREQ_ - gives a frequency count of the ZipCodes , which although useful, isn't something you wanted in your question. _FREQ_ -赋予的频率计数ZipCodes ,这虽然是有用的,是不是你在你的问题通缉。

DATA STEP METHOD 数据步法

PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
  BY ZipCodes;
RUN;

DATA output_table (DROP=TC);
  SET _temp;
  BY ZipCodes;
  IF first.ZipCodes THEN Total_Cars = 0;
  Total_Cars+tc;
  IF last.ZipCodes THEN OUTPUT;
RUN;

This is just included for completeness, it's not as efficient as it requires pre-sorting. 只是为了完整性而包括在内,它不如需要预分类那样有效。

To supplement @mjsqu's answer, for (more) completeness: 为了补充@mjsqu的答案,以(更多)完整性:

data testin;
    input Zip Cars;
    datalines;
11111 3
11111 4
23232 1
44331 0
44331 10
18860 6
18860 6
18860 6
18860 8
;

PROC TABULATE METHOD 过程累加法

proc tabulate data=testin out=testout

    /*drop extra created vars and rename as needed*/
    (drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));

    /*grouping variable, also used to sort output in ascending order*/
    class Zip;

    /* variable to be analyzed*/
    var Cars;

    /*sum cars by zip code*/
    table Zip, Cars*(sum);
run;

If using Enterprise Guide, this produces a dataset and a results table. 如果使用《企业指南》,则会生成一个数据集和一个结果表。 To suppress the results and only output a dataset, include this line before "proc tabulate": 要隐藏结果并仅输出数据集,请在“ proc tabulate”之前添加以下行:

ods select none; /*suppress ods output*/

and this after "run": 而这在“运行”之后:

ods select all; /*restore ods output*/
  1. The variable upon which you want to sum is "ZipCodes" so that will go into "Class" section. 您要求和的变量是“ ZipCodes”,因此将进入“类”部分。
  2. You want to sum Total_cars , so that will go into "var" section. 您想对Total_cars求和,这样将进入“ var”部分。
  3. Input_table and Output_table is self explanatory. Input_table和Output_table不言自明。

/ *Code / / *代码 /

   proc summary data=Input_table;
               class ZipCodes;
               var Total_cars;
        output out=Output_table
        sum()=;
        run;

You can use proc sql. 您可以使用proc sql。 this is a very simple step 这是一个非常简单的步骤

proc sql;

create table new as

select Zipcodes, sum(Total Cars) as total_cars from table_have group by Zipcodes

;

quit; 放弃;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM