I'm working in SAS and I have a table that looks like this
ID | Time | Main | lag_1 | lag_2
----------------------------------------------------------------------------
A | 01 | 0 | 0 | 1
A | 03 | 0 | 0 | 1
A | 04 | 0 | 0 | 0
A | 10 | 1 | 0 | 0
A | 11 | 1 | 0 | 0
A | 12 | 1 | 0 | 0
B | 02 | 1 | 1 | 1
B | 04 | 0 | 1 | 1
B | 07 | 0 | 0 | 1
B | 10 | 1 | 0 | 0
B | 11 | 1 | 0 | 0
B | 12 | 1 | 0 | 0
except with multiple IDs. The table is sorted by ID and Time. After calculating the total count of ones in the Main column (call it tot ), I am trying to calculate 2 things:
The table of expected calculations would give me that
tot | tot_1 | tot_2
--------------------
7 | 3 | 6
since tot_1 should be 3 (0 from ID = A + 3 from ID = B), and tot_2 should be 6 (3 from ID = A + 3 from ID = B).
I am a complete beginner in these types of segmentations so any help is greatly appreciated.
Edit: I would expect that tot_2 >= tot_1 because lag_2 is built on events from Main which go longer back in time than lag_1 does.
Much easier to do in a data step. That way you can check for start of new id and reset the flag for whether the lag_x variables were ever true.
data want ;
set have end=eof;
by id time ;
tot + main ;
if first.id then call missing(any_lag_1,any_lag_2);
if any_lag_1 then tot_1 + main ;
if any_lag_2 then tot_2 + main ;
if eof then output;
any_lag_1+lag_1;
any_lag_2+lag_2;
keep tot: ;
run;
If I understand correctly, you want these sums per id. The key is comparing the minimum value of the id under different circumstances, and then doing the sums. This is all conditional aggregation:
select sum(tot) as tot,
sum(case when id_lag_1 < id_main then tot else 0 end) as tot_1,
sum(case when id_lag_2 < id_main then tot else 0 end) as tot_2
from (select id, sum(main) as tot,
min(case when main = 1 then id end) as id_main,
min(case when lag_1 = 1 then id end) as id_lag_1,
min(case when lag_2 = 1 then id end) as id_lag_2
from t
group by id
) t;
Consider the computation for tot_1 and tot_2
My first step is to look for a pattern where lag_1 > main (This fulfills the case you mentioned that,ie find records where lag_1=1 sometime before main=1) and i name all such values as 'grp_lag_1' and 'grp_lag_2'
Once i have grouped the records, i "copy" down the values using max() over(order by id,time1).
select *
,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1
,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2
from t
So i get a result as follows
+----+-------+------+-------+-------+-----------+-----------+
| id | time1 | main | lag_1 | lag_2 | grp_1 | grp_2 |
+----+-------+------+-------+-------+-----------+-----------+
| A | 01 | 0 | 0 | 1 | | grp_lag_2 |
| A | 03 | 0 | 0 | 1 | | grp_lag_2 |
| A | 04 | 0 | 0 | 0 | | grp_lag_2 |
| A | 10 | 1 | 0 | 0 | | grp_lag_2 |
| A | 11 | 1 | 0 | 0 | | grp_lag_2 |
| A | 12 | 1 | 0 | 0 | | grp_lag_2 |
| B | 02 | 1 | 1 | 1 | | |
| B | 04 | 0 | 1 | 1 | grp_lag_1 | grp_lag_2 |
| B | 07 | 0 | 0 | 1 | grp_lag_1 | grp_lag_2 |
| B | 10 | 1 | 0 | 0 | grp_lag_1 | grp_lag_2 |
| B | 11 | 1 | 0 | 0 | grp_lag_1 | grp_lag_2 |
| B | 12 | 1 | 0 | 0 | grp_lag_1 | grp_lag_2 |
+----+-------+------+-------+-------+-----------+-----------+
After this if i were to sumup the main values for grp_lag_1 i would get tot_1 and likewise summing up grp+lag_2 i would get tot_2
select sum(main) as tot_cnt
,sum(case when grp_1='grp_lag_1' then main end) as tot_1
,sum(case when grp_2='grp_lag_2' then main end) as tot_2
from(
select *
,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1
,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2
from t
)x
+---------+-------+-------+
| tot_cnt | tot_1 | tot_2 |
+---------+-------+-------+
| 7 | 3 | 6 |
+---------+-------+-------+
Demo https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=c17be111dbc3c516afa2bc3dcd3c9e1c
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.