[英]SAS Looping: Summing observations vertically based on conditionally
I have a dataset that looks like: 我有一个数据集,看起来像:
Zip Codes Total Cars 邮政编码总计汽车
Ther are 3 million+ rows just like this, with varying zips. 像这样,有300万以上的行,并且拉链不同。 I need to sum total cars for each zip code, such that the resulting table looks like
我需要将每个邮政编码的汽车总数相加,以使结果表看起来像
Zip Codes Total Cars 邮政编码总计汽车
Manually inputting zips into the code is not an option considering the size of the dataset. 考虑到数据集的大小,不能手动将zip输入到代码中。 Thoughts?
思考?
Both answers so far are OK, but here is a more detailed explanation of both possible methods: 到目前为止,两个答案都可以,但是这里是对两种可能方法的更详细说明:
PROC SQL METHOD PROC SQL方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
The GROUP BY
clause can also be written GROUP BY 1
, omitting ZipCodes
, as this refers to the 1st column in the SELECT
clause. GROUP BY
子句也可以写成GROUP BY 1
,省略ZipCodes
,因为它引用SELECT
子句中的第一列。
PROC SUMMARY METHOD 过程总结方法
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
The method is similar to another answer to this question, but I've added: 该方法类似于该问题的另一个答案,但是我添加了:
NWAY
- gives only the maximum level of summarisation, here it's not as important because you have only one CLASS
variable, meaning there is only one level of summarisation. NWAY
仅给出最高级别的汇总,在此并不重要,因为您只有一个CLASS
变量,这意味着只有一个级别的汇总。 However, without NWAY
you get an extra row showing the total value of Total_Cars
across the whole dataset, which is not something you asked for in your question. 但是,如果没有
NWAY
您将获得额外的一行来显示整个数据集中的Total_Cars
,这不是您在问题中要的。
DROP=_TYPE_ _FREQ_
- This removes the automatic variables: DROP=_TYPE_ _FREQ_
这将删除自动变量:
_TYPE_
- which shows the level of summarisation (see comment above), which would just be a column containing the value 1
. _TYPE_
显示_TYPE_
级别(请参见上面的注释),它只是包含值1
的列。 _FREQ_
- gives a frequency count of the ZipCodes
, which although useful, isn't something you wanted in your question. _FREQ_
-赋予的频率计数ZipCodes
,这虽然是有用的,是不是你在你的问题通缉。 DATA STEP METHOD 数据步法
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
This is just included for completeness, it's not as efficient as it requires pre-sorting. 只是为了完整性而包括在内,它不如需要预分类那样有效。
To supplement @mjsqu's answer, for (more) completeness: 为了补充@mjsqu的答案,以(更多)完整性:
data testin;
input Zip Cars;
datalines;
11111 3
11111 4
23232 1
44331 0
44331 10
18860 6
18860 6
18860 6
18860 8
;
PROC TABULATE METHOD 过程累加法
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
If using Enterprise Guide, this produces a dataset and a results table. 如果使用《企业指南》,则会生成一个数据集和一个结果表。 To suppress the results and only output a dataset, include this line before "proc tabulate":
要隐藏结果并仅输出数据集,请在“ proc tabulate”之前添加以下行:
ods select none; /*suppress ods output*/
and this after "run": 而这在“运行”之后:
ods select all; /*restore ods output*/
/ *Code / / *代码 /
proc summary data=Input_table;
class ZipCodes;
var Total_cars;
output out=Output_table
sum()=;
run;
You can use proc sql. 您可以使用proc sql。 this is a very simple step
这是一个非常简单的步骤
proc sql;
create table new as
select Zipcodes, sum(Total Cars) as total_cars from table_have group by Zipcodes
;
quit; 放弃;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.