简体   繁体   English

SAS-如何计算某月前10年的观察次数

[英]SAS-How to count the number of observation over the 10 years prior to certain month

I have a sample that include two variables: ID and ym.我有一个包含两个变量的示例:ID 和 ym。 ID id refer to the specific ID for each trader and ym refer to the year-month variable. ID id 指的是每个交易者的特定 ID, ym 指的是年月变量。 And I want to create a variable that show the number of years over the 10 years period prior month t as shown in the following figure.我想创建一个变量,显示t月前 10 年期间的年数,如下图所示。

ID  ym  Want
1   200101  0
1   200301  1
1   200401  2
1   200501  3
1   200601  4
1   200801  5
1   201201  5
1   201501  4
2   200001  0
2   200203  1
2   200401  2
2   200506  3

I attempt to use by function and fisrt.id to count the number.我尝试by function 和fisrt.id来计算数量。

data want;
set have;
want+1;
by id;
if first.id then want=1;
run;

However, the year in ym is not continuous.但是,ym 中的年份不是连续的。 When the time gap is higher than 10 years, this method is not working.当时间间隔大于 10 年时,这种方法是行不通的。 Although I assume I need to count the number of year in a rolling window (10 years), I am not sure how to achieve it.虽然我假设我需要计算滚动 window (10 年)的年数,但我不确定如何实现它。 Please give me some suggestions.请给我一些建议。 Thanks.谢谢。

Just do a self join in SQL.只需在 SQL 中进行自我加入。 With your coding of YM it is easy to do interval that is a multiple of a year, but harder to do other intervals.使用您的 YM 编码,很容易做一年的倍数的间隔,但更难做其他间隔。

proc sql;
create table want as 
  select a.id,a.ym,count(b.ym) as want 
  from have a 
   left join have b
   on a.id = b.id
   and (a.ym - 1000) <= b.ym < a.ym
  group by a.id,a.ym
  order by a.id,a.ym
;
quit;

This method retains the previous values for each ID and directly checks to see how many are within 120 months of the current value.此方法为每个 ID 保留以前的值,并直接检查当前值的 120 个月内有多少个。 It is not optimized but it works.它没有优化,但可以工作。 You can set the array m() to the maximum number of values you have per ID if you care about efficiency.如果您关心效率,可以将数组 m() 设置为每个 ID 拥有的最大值数。

The variable d is a quick shorthand I often use which converts years/months into an integer value - so变量 d 是我经常使用的快速速记,它将年/月转换为 integer 值 - 所以

200012 -> (2000*12) + 12 = 24012
200101 -> (2001*12) + 1 = 24013
time from 200012 to 200101 = 24013 - 24012 = 1 month
data have;
   input id ym;
datalines;
1   200101  
1   200301  
1   200401  
1   200501  
1   200601  
1   200801  
1   201201  
1   201501  
2   200001  
2   200203  
2   200401  
2   200506  
;

proc sort data=have;
   by id ym;

data want (keep=id ym want);
   set have;
   by id;
   
   retain seq m1-m100;
   
   array m(100) m1-m100;
   
   ** Convert date to comparable value **;
   d = 12 * floor(ym/100) + mod(ym,10);
   
   ** Initialize number of previous records **;
   want = 0;
   
   ** If first record, set retained values to missing and leave want=0 **;
   if first.id then call missing(seq,of m1-m100);
   ** Otherwise loop through previous months and count how many were within 120 months **;
   else do;
      do i = 1 to seq;
         if d <= (m(i) + 120) then want = want + 1;
      end;
   end;
   
   ** Increment variables for next iteration **;
   seq + 1;
   m(seq) = d;
 
 run;
 
 proc print data=want noobs;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM