SAS循环：基于条件垂直汇总观测值

Question

I have a dataset that looks like: 我有一个数据集，看起来像：

Zip Codes Total Cars 邮政编码总计汽车

11111 3 11111 3
11111 4 11111 4
23232 1 23232 1
44331 0 44331 0
44331 10 44331 10
18860 6 18860 6
18860 6 18860 6
18860 6 18860 6
18860 8 18860 8

Ther are 3 million+ rows just like this, with varying zips. 像这样，有300万以上的行，并且拉链不同。 I need to sum total cars for each zip code, such that the resulting table looks like 我需要将每个邮政编码的汽车总数相加，以使结果表看起来像

Zip Codes Total Cars 邮政编码总计汽车

11111 7 11111 7
23232 1 23232 1
44331 10 44331 10
18860 26 . 18860 26 。 . 。 . 。

Manually inputting zips into the code is not an option considering the size of the dataset. 考虑到数据集的大小，不能手动将zip输入到代码中。 Thoughts? 思考？

Answer 1

Both answers so far are OK, but here is a more detailed explanation of both possible methods: 到目前为止，两个答案都可以，但是这里是对两种可能方法的更详细说明：

PROC SQL METHOD PROC SQL方法

PROC SQL;
  CREATE TABLE output_table AS
  SELECT ZipCodes,
  SUM(Total_Cars) as Total_Cars
  FROM input_table
  GROUP BY ZipCodes;
QUIT;

The GROUP BY clause can also be written GROUP BY 1 , omitting ZipCodes , as this refers to the 1st column in the SELECT clause. GROUP BY子句也可以写成GROUP BY 1 ，省略ZipCodes ，因为它引用SELECT子句中的第一列。

PROC SUMMARY METHOD 过程总结方法

PROC SUMMARY DATA=input_table NWAY;
             CLASS ZipCodes;
             VAR Total_Cars;
             OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;

The method is similar to another answer to this question, but I've added: 该方法类似于该问题的另一个答案，但是我添加了：

NWAY - gives only the maximum level of summarisation, here it's not as important because you have only one CLASS variable, meaning there is only one level of summarisation. NWAY仅给出最高级别的汇总，在此并不重要，因为您只有一个CLASS变量，这意味着只有一个级别的汇总。 However, without NWAY you get an extra row showing the total value of Total_Cars across the whole dataset, which is not something you asked for in your question. 但是，如果没有NWAY您将获得额外的一行来显示整个数据集中的Total_Cars ，这不是您在问题中要的。
DROP=_TYPE_ _FREQ_ - This removes the automatic variables: DROP=_TYPE_ _FREQ_这将删除自动变量：
- _TYPE_ - which shows the level of summarisation (see comment above), which would just be a column containing the value 1 . _TYPE_显示_TYPE_级别（请参见上面的注释），它只是包含值1的列。
- _FREQ_ - gives a frequency count of the ZipCodes , which although useful, isn't something you wanted in your question. _FREQ_ -赋予的频率计数ZipCodes ，这虽然是有用的，是不是你在你的问题通缉。

DATA STEP METHOD 数据步法

PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
  BY ZipCodes;
RUN;

DATA output_table (DROP=TC);
  SET _temp;
  BY ZipCodes;
  IF first.ZipCodes THEN Total_Cars = 0;
  Total_Cars+tc;
  IF last.ZipCodes THEN OUTPUT;
RUN;

This is just included for completeness, it's not as efficient as it requires pre-sorting. 只是为了完整性而包括在内，它不如需要预分类那样有效。

Answer 2

To supplement @mjsqu's answer, for (more) completeness: 为了补充@mjsqu的答案，以（更多）完整性：

data testin;
    input Zip Cars;
    datalines;
11111 3
11111 4
23232 1
44331 0
44331 10
18860 6
18860 6
18860 6
18860 8
;

PROC TABULATE METHOD 过程累加法

proc tabulate data=testin out=testout

    /*drop extra created vars and rename as needed*/
    (drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));

    /*grouping variable, also used to sort output in ascending order*/
    class Zip;

    /* variable to be analyzed*/
    var Cars;

    /*sum cars by zip code*/
    table Zip, Cars*(sum);
run;

If using Enterprise Guide, this produces a dataset and a results table. 如果使用《企业指南》，则会生成一个数据集和一个结果表。 To suppress the results and only output a dataset, include this line before "proc tabulate": 要隐藏结果并仅输出数据集，请在“ proc tabulate”之前添加以下行：

ods select none; /*suppress ods output*/

and this after "run": 而这在“运行”之后：

ods select all; /*restore ods output*/

Answer 3

The variable upon which you want to sum is "ZipCodes" so that will go into "Class" section. 您要求和的变量是“ ZipCodes”，因此将进入“类”部分。
You want to sum Total_cars , so that will go into "var" section. 您想对Total_cars求和，这样将进入“ var”部分。
Input_table and Output_table is self explanatory. Input_table和Output_table不言自明。

/ *Code / / *代码 /

   proc summary data=Input_table;
               class ZipCodes;
               var Total_cars;
        output out=Output_table
        sum()=;
        run;

Answer 4

You can use proc sql. 您可以使用proc sql。 this is a very simple step 这是一个非常简单的步骤

proc sql;

create table new as

select Zipcodes, sum(Total Cars) as total_cars from table_have group by Zipcodes

;

quit; 放弃;

SAS循环：基于条件垂直汇总观测值

问题描述

4 个解决方案

解决方案1
3 已采纳 2014-12-03 11:18:12

解决方案2
1 2015-03-18 17:16:57

解决方案3
0 2014-12-03 06:01:19

解决方案4
0 2014-12-03 08:13:12

SAS循环：基于条件垂直汇总观测值

问题描述

4 个解决方案

解决方案1 3 已采纳 2014-12-03 11:18:12

解决方案2 1 2015-03-18 17:16:57

解决方案3 0 2014-12-03 06:01:19

解决方案4 0 2014-12-03 08:13:12

解决方案1
3 已采纳 2014-12-03 11:18:12

解决方案2
1 2015-03-18 17:16:57

解决方案3
0 2014-12-03 06:01:19

解决方案4
0 2014-12-03 08:13:12