简体   繁体   English

SAS-如何基于连续事件进行“总结”

[英]SAS - how to 'sum up' based on consecutive occurrences

First time post so hopefully someone can kindly assist on this problem I'm facing within SAS EG (still learning SAS coding so please be kind!) 第一次发贴,希望有人可以在SAS EG中解决这个问题(仍在学习SAS编码,所以请客气!)

If you see a snippet of the dataset below what I'm trying to do is tally up the scores (pts) by Ref based on consecutive occurrences that flag has showed for that Ref. 如果您在我要执行的操作下面看到数据集的摘要,则根据标记为该Ref显示的连续出现次数,将Ref的分数(pts)相加。

For Example: If you take Ref 505 for A_Flag there is 2 different sets of consecutive occurrences of that flag then scoring will be as follows: 例如:如果将Ref 505用作A_Flag,则该标志有2组连续出现的不同集合,则评分将如下所示:

  • 1st ID > 1st instance = 25 points 第一个ID>第一个实例= 25点
  • 2nd ID > 2nd instance but 1st consecutive instance = double to 50 points 第二个ID>第二个实例,但连续第一个实例=翻倍至50点
  • 3rd ID > 0 instance = 0 points 第三个ID> 0实例= 0点
  • 4th ID > 1st instance = 25 points 第四个ID>第一个实例= 25点
  • 5th ID > 2nd instance but 1st consecutive instance = double to 50 points 第5个ID>第2个实例,但第一个连续实例=翻倍至50点
  • 6th ID > 0 instance = 0 points 第六个ID> 0实例= 0点

Therefore for this Ref A_Pts will be 150 points. 因此,此参考A_Pts将为150点。

Another example: If you take Ref 527 for B_Flag there is 4 consecutive occurrences of that flag so coring per ID: 另一个示例:如果将Ref 527用作B_Flag,则该标志连续出现4次,因此每个ID取芯:

  • 1st ID > 0 instance = 0 points 第一个ID> 0实例= 0点
  • 2nd ID > 1st instance = 10 points 第二个ID>第一个实例= 10分
  • 3rd ID > 2nd instance but 1st consecutive instance = double to 20 points 第三个ID>第二个实例,但连续第一个实例=翻倍至20点
  • 4th ID > 3rd instance but 2nd consecutive instance = double to 40 points 第四个ID>第三个实例,但第二个连续实例=翻倍至40点
  • 5th ID > 4th instance but 3rd consecutive instance = double to 80 points 第5个ID>第4个实例,但第3个连续实例=翻倍至80点

Therefore for this Ref B_Pts will be 150 points 因此,此参考B_Pts将为150点

I have to say the data is in the necessary order for what I'm trying to achieve. 我必须说数据是按照我想要达到的必要顺序排列的。

I'd tried using LAG function but that will only work based on the 1st consecutive instance. 我曾尝试使用LAG函数,但这只能基于第一个连续实例。

I also tried calculate a count - an enumeration variable based on cats(Ref,A_Flag) - but it then orders the data incorrectly and doesnt count up accordingly 我也尝试计算一个计数-基于cats(Ref,A_Flag)的枚举变量-但是它随后错误地排序了数据,因此没有相应地计数

Hopefully this makes sense to someone out there! 希望这对外面的人有意义!

The dataset in question: 有问题的数据集:

+-----------+-----+--------+--------+--------+-------+-------+
|   date    | Ref | FormID | A_Flag | B_Flag | A_Pts | B_Pts |
+-----------+-----+--------+--------+--------+-------+-------+
| 01-Feb-17 | 505 |  74549 | A      |        |    25 |     0 |
| 01-Feb-17 | 505 |  74550 | A      |        |    25 |     0 |
| 10-Jan-17 | 505 |  82900 |        | B      |     0 |    10 |
| 13-Jan-17 | 505 |  82906 | A      |        |    25 |     0 |
| 09-Jan-17 | 505 |  82907 | A      |        |    25 |     0 |
| 11-Jan-17 | 505 |  82909 |        | B      |     0 |    10 |
| 03-Jan-17 | 527 |  62549 | A      |        |    25 |     0 |
| 04-Jan-17 | 527 |  62550 |        | B      |     0 |    10 |
| 04-Jan-17 | 527 |  76151 |        | B      |     0 |    10 |
| 04-Jan-17 | 527 |  76152 | A      | B      |    25 |    10 |
| 04-Jan-17 | 527 |  76153 | A      | B      |    25 |    10 |
+-----------+-----+--------+--------+--------+-------+-------+

Desired output (unless there is a better suggestion): 所需的输出(除非有更好的建议):

+-----------+-----+--------+--------+--------+-----------+-----------+
|   date    | Ref | FormID | A_Flag | B_Flag | A_Pts_Agg | B_Pts_Agg |
+-----------+-----+--------+--------+--------+-----------+-----------+
| 01-Feb-17 | 505 |  74549 | A      |        |        25 |         0 |
| 01-Feb-17 | 505 |  74550 | A      |        |        50 |         0 |
| 10-Jan-17 | 505 |  82900 |        | B      |         0 |        10 |
| 13-Jan-17 | 505 |  82906 | A      |        |        25 |         0 |
| 09-Jan-17 | 505 |  82907 | A      |        |        50 |         0 |
| 11-Jan-17 | 505 |  82909 |        | B      |         0 |        10 |
| 03-Jan-17 | 527 |  62549 | A      |        |        25 |         0 |
| 04-Jan-17 | 527 |  62550 |        | B      |         0 |        10 |
| 04-Jan-17 | 527 |  76151 |        | B      |         0 |        20 |
| 04-Jan-17 | 527 |  76152 | A      | B      |        25 |        40 |
| 04-Jan-17 | 527 |  76153 | A      | B      |        50 |        80 |
+-----------+-----+--------+--------+--------+-----------+-----------+

So when totalled up it'll be 因此,总计时

+-----+-----------+-----------+
| Ref | A_Pts_Agg | B_Pts_Agg |
+-----+-----------+-----------+
| 505 |       150 |        20 |
| 527 |       100 |       150 |
+-----+-----------+-----------+

Try this: 尝试这个:

data have;
infile cards dlm='|';
input date :date7. Ref :8. FormID :8. A_Flag :$1. B_Flag :$1. A_Pts :8.  B_Pts :8.;
format date date7.;
cards;
| 01-Feb-17 | 505 |  74549 | A      |        |    25 |     0 |
| 01-Feb-17 | 505 |  74550 | A      |        |    25 |     0 |
| 10-Jan-17 | 505 |  82900 |        | B      |     0 |    10 |
| 13-Jan-17 | 505 |  82906 | A      |        |    25 |     0 |
| 09-Jan-17 | 505 |  82907 | A      |        |    25 |     0 |
| 11-Jan-17 | 505 |  82909 |        | B      |     0 |    10 |
| 03-Jan-17 | 527 |  62549 | A      |        |    25 |     0 |
| 04-Jan-17 | 527 |  62550 |        | B      |     0 |    10 |
| 04-Jan-17 | 527 |  76151 |        | B      |     0 |    10 |
| 04-Jan-17 | 527 |  76152 | A      | B      |    25 |    10 |
| 04-Jan-17 | 527 |  76153 | A      | B      |    25 |    10 |
;
run;

data want;
  set have;
  by Ref;
  retain A_pts_agg B_pts_agg;
  if first.Ref then do;
    A_pts_agg = A_pts;
    B_pts_agg = B_pts;
  end;
  if lag(A_flag) ne (A_flag) then A_pts_agg = A_pts;
  else if A_flag = 'A' then A_pts_agg = A_pts_agg * 2;
  if lag(B_flag) ne (B_flag) then B_pts_agg = B_pts;
  else if B_flag = 'B' then B_pts_agg = B_pts_agg * 2;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM