[英]How can I create a new variable which calculates sum of a specific variable (by ID) containing multiple observation in SAS?
For example, I want to create a new dataset (Data2) from Data1. 例如,我想从Data1创建一个新的数据集(Data2)。
A new variable, cost in data2 is calculated as sum of multiple observation by ID in material of data1. 计算新变量data2中的成本,作为对data1物料中ID的多次观察总和。
(Data1) (数据1)
ID material
1 4
1 4
1 4
2 2
2 4
2 4
3 2
3 6
3 6
4 5
4 5
4 5
4 5
5 2
5 4
5 4
5 8
(Data2) (DATA2)
ID cost
1 12 #4+4+4
2 10 #2+4+4
3 14 #2+6+6
4 20 #5+5+5+5
5 18 #2+4+4+8
I have used SAS EG version only for simple analysis, and recently I started to use proc sql procedure. 我仅使用SAS EG版本进行简单分析,最近我开始使用proc sql过程。 As a beginner in SAS coding (proc sql), it was very hard to approach the answer, for myself.
作为SAS编码(proc sql)的初学者,对于我自己来说很难找到答案。 Thank you very much, in advance.
提前非常感谢您。
Base SAS has several procedures that will present aggregated values over a group. 基本SAS有几个过程,将在一个组中显示汇总值。
MEANS
, SUMMARY
, and reporting procedures such as REPORT
and TABULATE
. MEANS
, SUMMARY
和报告程序,如REPORT
和TABULATE
。 The procedures can also save output data sets containing the computed aggregates. 该过程还可以保存包含计算出的聚合的输出数据集。
data have; input
ID material_cost;datalines;
1 4
1 4
1 4
2 2
2 4
2 4
3 2
3 6
3 6
4 5
4 5
4 5
4 5
5 2
5 4
5 4
5 8
run;
title "Proc MEANS";
proc means data=have sum noNobs maxdec=0;
class id;
var material_cost;
run;
title "Proc SUMMARY";
proc summary data=have print sum noNobs maxdec=0;
class id;
var material_cost;
run;
title "Proc REPORT";
proc report data=have;
columns id material_cost;
define id / group;
run;
title "Proc TABULATE";
proc tabulate data=have;
class id;
var material_cost;
table id, material_cost*sum / NoCellMerge;
run;
If you want to use PROC SQL
, this is a straight forward use of GROUP BY
如果要使用
PROC SQL
,这是GROUP BY
的直接使用
proc sql;
select id, sum(material) as sum from mydataset group by id;
quit;
You could manually compute this in a datastep also if you don't want to use PROC SQL
如果不想使用
PROC SQL
也可以在数据步骤中手动计算
proc sort data=mydataset;
by id;
run;
data sums;
set mydataset;
by id;
if first.id then sum = 0;
sum + material;
if last.id then output;
keep id sum;
run;
proc print data=sums;
run;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.