My data looks like this :
ROW ID DATE DO CO FLAG
1 6405 9/16/2010 1000 . 1
2 6405 9/16/2010 0 32 2
3 6405 9/17/2010 500 . 1
4 6405 9/17/2010 1000 . 1
5 6405 9/17/2010 1000 . 1
6 6405 9/18/2010 1000 . 1
7 6405 9/18/2010 0 37 2
8 6405 9/18/2010 1250 . 1
9 6405 9/19/2010 1000 . 1
10 6405 9/19/2010 1000 . 1
11 6405 9/19/2010 0 65 2
12 6405 9/20/2010 500 . 0
13 6405 9/21/2010 1250 . 0
14 2654 8/4/2010 1000 . 0
15 2654 8/5/2010 0 15 2
16 2654 8/5/2010 900 . 1
17 2654 8/5/2010 300 . 1
18 2654 8/6/2010 750 . 0
19 2654 8/7/2010 1000 . 1
20 2654 8/7/2010 0 45 2
21 4567 6/8/2010 670 . 1
22 4567 6/8/2010 700 . 1
23 4567 6/8/2010 0 34 2
24 4567 6/8/2010 1000 . 1
25 4567 6/8/2010 500 . 1
My data should look like:
ROW ID DATE DO CO FLAG TDD1
1 6405 9/16/2010 1000 . 1 1000
2 6405 9/16/2010 0 32 2 0
3 6405 9/17/2010 500 . 1 2500
4 6405 9/17/2010 1000 . 1 2500
5 6405 9/17/2010 1000 . 1 2500
6 6405 9/18/2010 1000 . 1 1000
7 6405 9/18/2010 0 37 2 0
8 6405 9/18/2010 1250 . 1 1250
9 6405 9/19/2010 1000 . 1 2000
10 6405 9/19/2010 1000 . 1 2000
11 6405 9/19/2010 0 65 2 0
12 6405 9/20/2010 500 . 0 500
13 6405 9/21/2010 1250 . 0 1250
14 2654 8/4/2010 1000 . 0 1000
15 2654 8/5/2010 0 15 2 0
16 2654 8/5/2010 900 . 1 1200
17 2654 8/5/2010 300 . 1 1200
18 2654 8/6/2010 750 . 0 750
19 2654 8/7/2010 1000 . 1 1000
20 2654 8/7/2010 0 45 2 0
21 4567 6/8/2010 670 . 1 1370
22 4567 6/8/2010 700 . 1 1370
23 4567 6/8/2010 0 34 2 0
24 4567 6/8/2010 1000 . 1 1500
25 4567 6/8/2010 500 . 1 1500
So I want to create a column TDD1 where for each id having consecutively repeating dates, the corresponding value in TDD1 column should be the sum total of values in DO column for those consecutively repeating dates. For example see rows 3,4,5.
If the value of FLAG column is 2 or 0 then the corresponding TDD1 value should be the DO value for that row. For example see rows 2,7,11,15 and 20 (for FLAG=2) and rows 12,13,14,18 and 23 (for FLAG=0).
The FLAG column has consecutively repeating 1's for consecutively repeating dates for each ID unless the column CO has a value and in which case FLAG value becomes 2. For example see rows 9 to 11. In case of rows 6 to 8, the dates repeat consecutively however the FLAG column doesn't have consecutive 1's. So, in such situations where 1's do not occur consecutively or occur in isolation for a particular date and ID the TDD1 value should be the same as DO value for that row. Also, see rows 19,20.
Another point, if FLAG value of 2 occurs in a series of rows having the same dates, the computation of TDD1 column needs to be reset. For example see rows 21 to 25. Notice, rows 21 and 22 have TDD1 value of 1370(670+700), and rows 24,25 have TDD1 value of 1500(1000+500).
It would be a great help if you could provide r code for this. Thank you.
I don't know about efficiency, but here is an alternative using dplyr
package (and %>% of magrittr
for some nice code legibility).
library(magrittr)
library(dplyr)
data <- data %>%
mutate(flag_1_consecutive = cumsum(!FLAG %in% 1)) %>%
group_by(ID, DATE, FLAG, flag_1_consecutive) %>%
mutate(TDD1 = sum(DO))
What I did was grouping rows by your rules, which was defined by ID, DATE and consecutive flags '1'. Then, I just summed up DO.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.