简体   繁体   English

对数据集 SAS 中的连续观测值求和

[英]Sum consecutive observations in a dataset SAS

I have a dataset that looks like:我有一个如下所示的数据集:

  Hour    Flag
    1       1
    2       1
    3       .
    4       1
    5       1
    6       .
    7       1
    8       1 
    9       1
    10      . 
    11      1
    12      1
    13      1
    14      1

I want to have an output dataset like:我想要一个 output 数据集,例如:

   Total_Hours   Count
        2          2
        3          1
        4          1

As you can see, I want to count the number of hours included in each period with consecutive "1s".如您所见,我想用连续的“1s”来计算每个时期包含的小时数。 A missing value ends the consecutive sequence.缺失值结束连续序列。

How should I go about doing this?我应该如何 go 这样做? Thanks!谢谢!

You'll need to do this in two steps.您需要分两步执行此操作。 First step is making sure the data is sorted properly and determining the number of hours in a consecutive period:第一步是确保数据正确排序并确定连续时间段内的小时数:

PROC SORT DATA = <your dataset>;
  BY hour;
RUN;

DATA work.consecutive_hours;
  SET <your dataset> END = lastrec;

  RETAIN
    total_hours 0
  ;

  IF flag = 1 THEN total_hours = total_hours + 1;
  ELSE
    DO;
      IF total_hours > 0 THEN output;
      total_hours = 0;
    END;
  /* Need to output last record */
  IF lastrec AND total_hours > 0 THEN output;

  KEEP 
    total_hours
  ;
RUN;

Now a simple SQL statement:现在是一个简单的 SQL 语句:

PROC SQL;
  CREATE TABLE work.hour_summary AS
  SELECT
    total_hours
   ,COUNT(*) AS count
  FROM
    work.consecutive_hours
  GROUP BY
    total_hours
  ;
QUIT;

You will have to do two things:你必须做两件事:

  • compute the run lengths计算运行长度
  • compute the frequency of the run lengths计算运行长度的频率

For the case of using the implict loop对于使用隐式循环的情况

  • Each run length occurnece can be computed and maintained in a retained tracking variable, testing for a missing value or end of data for output and a non missing value for run length reset or increment.每个运行长度发生都可以在保留的跟踪变量中计算和维护,测试output的缺失值或数据结尾以及运行长度重置或增量的非缺失值。
  • Proc FREQ处理FREQ

An alternative is to use an explicit loop and a hash for frequency counts.另一种方法是使用显式循环和 hash 进行频率计数。

Example:例子:

data have; input
Hour    Flag; datalines;
  1       1
  2       1
  3       .
  4       1
  5       1
  6       .
  7       1
  8       1
  9       1
  10      .
  11      1
  12      1
  13      1
  14      1
;

data _null_;
  declare hash counts(ordered:'a');
  counts.defineKey('length');
  counts.defineData('length', 'count');
  counts.defineDone();

  do until (end);
    set have end=end;

    if not missing(flag) then 
      length + 1;

    if missing(flag) or end then do;
      if length > 0 then do;
        if counts.find() eq 0 
          then count+1;
          else count=1;
        counts.replace();
        length = 0;
      end;
    end;
  end;

  counts.output(dataset:'want');
run;

An alternative替代

data _null_;
   if _N_ = 1 then do;
      dcl hash h(ordered : "a");
      h.definekey("Total_Hours");
      h.definedata("Total_Hours", "Count");
      h.definedone();
   end;

   do Total_Hours = 1 by 1 until (last.Flag);
      set have end=lr;
      by Flag notsorted;
   end;

   Count = 1;

   if Flag then do;
      if h.find() = 0 then Count+1;
      h.replace();
   end;

   if lr then h.output(dataset : "want");
run;

Several weeks ago, @Richard taught me how to use DOW-loop and direct addressing array.几周前,@Richard 教我如何使用 DOW 循环和直接寻址数组。 Today, I give it to you.今天,我给你。

data want(keep=Total_Hours Count);

  array bin[99]_temporary_;
  do until(eof1);
    set have end=eof1;
    if Flag then count + 1;
    if ^Flag or eof1 then do;
      bin[count] + 1;
      count = .;
    end;
  end;

  do i = 1 to dim(bin);
    Total_Hours = i;
    Count = bin[i];
    if Count then output;
  end;
run;

And Thanks Richard again, he also suggested me this article .再次感谢理查德,他也向我推荐了这篇文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM