[英]Sum consecutive observations in a dataset SAS
I have a dataset that looks like:我有一个如下所示的数据集:
Hour Flag
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
I want to have an output dataset like:我想要一个 output 数据集,例如:
Total_Hours Count
2 2
3 1
4 1
As you can see, I want to count the number of hours included in each period with consecutive "1s".如您所见,我想用连续的“1s”来计算每个时期包含的小时数。 A missing value ends the consecutive sequence.
缺失值结束连续序列。
How should I go about doing this?我应该如何 go 这样做? Thanks!
谢谢!
You'll need to do this in two steps.您需要分两步执行此操作。 First step is making sure the data is sorted properly and determining the number of hours in a consecutive period:
第一步是确保数据正确排序并确定连续时间段内的小时数:
PROC SORT DATA = <your dataset>;
BY hour;
RUN;
DATA work.consecutive_hours;
SET <your dataset> END = lastrec;
RETAIN
total_hours 0
;
IF flag = 1 THEN total_hours = total_hours + 1;
ELSE
DO;
IF total_hours > 0 THEN output;
total_hours = 0;
END;
/* Need to output last record */
IF lastrec AND total_hours > 0 THEN output;
KEEP
total_hours
;
RUN;
Now a simple SQL statement:现在是一个简单的 SQL 语句:
PROC SQL;
CREATE TABLE work.hour_summary AS
SELECT
total_hours
,COUNT(*) AS count
FROM
work.consecutive_hours
GROUP BY
total_hours
;
QUIT;
You will have to do two things:你必须做两件事:
For the case of using the implict loop对于使用隐式循环的情况
output
and a non missing value for run length reset or increment.output
的缺失值或数据结尾以及运行长度重置或增量的非缺失值。FREQ
FREQ
An alternative is to use an explicit loop and a hash for frequency counts.另一种方法是使用显式循环和 hash 进行频率计数。
Example:例子:
data have; input
Hour Flag; datalines;
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
;
data _null_;
declare hash counts(ordered:'a');
counts.defineKey('length');
counts.defineData('length', 'count');
counts.defineDone();
do until (end);
set have end=end;
if not missing(flag) then
length + 1;
if missing(flag) or end then do;
if length > 0 then do;
if counts.find() eq 0
then count+1;
else count=1;
counts.replace();
length = 0;
end;
end;
end;
counts.output(dataset:'want');
run;
An alternative替代
data _null_;
if _N_ = 1 then do;
dcl hash h(ordered : "a");
h.definekey("Total_Hours");
h.definedata("Total_Hours", "Count");
h.definedone();
end;
do Total_Hours = 1 by 1 until (last.Flag);
set have end=lr;
by Flag notsorted;
end;
Count = 1;
if Flag then do;
if h.find() = 0 then Count+1;
h.replace();
end;
if lr then h.output(dataset : "want");
run;
Several weeks ago, @Richard taught me how to use DOW-loop and direct addressing array.几周前,@Richard 教我如何使用 DOW 循环和直接寻址数组。 Today, I give it to you.
今天,我给你。
data want(keep=Total_Hours Count);
array bin[99]_temporary_;
do until(eof1);
set have end=eof1;
if Flag then count + 1;
if ^Flag or eof1 then do;
bin[count] + 1;
count = .;
end;
end;
do i = 1 to dim(bin);
Total_Hours = i;
Count = bin[i];
if Count then output;
end;
run;
And Thanks Richard again, he also suggested me this article .再次感谢理查德,他也向我推荐了这篇文章。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.