Create a column whose values are sums of values in another column based on certain conditions in r

Question

My data looks like this :

ROW   ID  DATE       DO    CO   FLAG
1   6405  9/16/2010  1000   .   1
2   6405  9/16/2010     0  32   2
3   6405  9/17/2010   500   .   1
4   6405  9/17/2010  1000   .   1
5   6405  9/17/2010  1000   .   1
6   6405  9/18/2010  1000   .   1
7   6405  9/18/2010     0  37   2
8   6405  9/18/2010  1250   .   1
9   6405  9/19/2010  1000   .   1
10  6405  9/19/2010  1000   .   1
11  6405  9/19/2010     0  65   2
12  6405  9/20/2010   500   .   0
13  6405  9/21/2010  1250   .   0
14  2654  8/4/2010   1000   .   0
15  2654  8/5/2010      0  15   2
16  2654  8/5/2010    900   .   1
17  2654  8/5/2010    300   .   1
18  2654  8/6/2010    750   .   0
19  2654  8/7/2010   1000   .   1
20  2654  8/7/2010      0  45   2
21  4567  6/8/2010    670   .   1       
22  4567  6/8/2010    700   .   1       
23  4567  6/8/2010      0  34   2        
24  4567  6/8/2010    1000  .   1       
25  4567  6/8/2010     500  .   1

My data should look like:

ROW  ID     DATE        DO      CO FLAG TDD1
1   6405    9/16/2010   1000    .   1   1000
2   6405    9/16/2010      0    32  2   0
3   6405    9/17/2010    500    .   1   2500
4   6405    9/17/2010   1000    .   1   2500
5   6405    9/17/2010   1000    .   1   2500
6   6405    9/18/2010   1000    .   1   1000
7   6405    9/18/2010      0    37  2   0
8   6405    9/18/2010   1250    .   1   1250
9   6405    9/19/2010   1000    .   1   2000
10  6405    9/19/2010   1000    .   1   2000
11  6405    9/19/2010      0    65  2   0
12  6405    9/20/2010    500    .   0   500
13  6405    9/21/2010   1250    .   0   1250
14  2654    8/4/2010    1000    .   0   1000
15  2654    8/5/2010       0    15  2   0
16  2654    8/5/2010     900    .   1   1200
17  2654    8/5/2010     300    .   1   1200
18  2654    8/6/2010     750    .   0   750
19  2654    8/7/2010    1000    .   1   1000
20  2654    8/7/2010       0    45  2   0
21  4567    6/8/2010     670    .   1   1370
22  4567    6/8/2010     700    .   1   1370 
23  4567    6/8/2010       0    34  2   0 
24  4567    6/8/2010    1000    .   1   1500
25  4567    6/8/2010     500    .   1   1500

So I want to create a column TDD1 where for each id having consecutively repeating dates, the corresponding value in TDD1 column should be the sum total of values in DO column for those consecutively repeating dates. For example see rows 3,4,5.

If the value of FLAG column is 2 or 0 then the corresponding TDD1 value should be the DO value for that row. For example see rows 2,7,11,15 and 20 (for FLAG=2) and rows 12,13,14,18 and 23 (for FLAG=0).

The FLAG column has consecutively repeating 1's for consecutively repeating dates for each ID unless the column CO has a value and in which case FLAG value becomes 2. For example see rows 9 to 11. In case of rows 6 to 8, the dates repeat consecutively however the FLAG column doesn't have consecutive 1's. So, in such situations where 1's do not occur consecutively or occur in isolation for a particular date and ID the TDD1 value should be the same as DO value for that row. Also, see rows 19,20.

Another point, if FLAG value of 2 occurs in a series of rows having the same dates, the computation of TDD1 column needs to be reset. For example see rows 21 to 25. Notice, rows 21 and 22 have TDD1 value of 1370(670+700), and rows 24,25 have TDD1 value of 1500(1000+500).

It would be a great help if you could provide r code for this. Thank you.

Answer 1

I don't know about efficiency, but here is an alternative using dplyr package (and %>% of magrittr for some nice code legibility).

library(magrittr)
library(dplyr)

data <- data %>%
  mutate(flag_1_consecutive = cumsum(!FLAG %in% 1)) %>%
  group_by(ID, DATE, FLAG, flag_1_consecutive) %>%
  mutate(TDD1 = sum(DO))

What I did was grouping rows by your rules, which was defined by ID, DATE and consecutive flags '1'. Then, I just summed up DO.

Create a column whose values are sums of values in another column based on certain conditions in r

Question

1 answers

solution1
0 2014-09-26 01:44:23

Create a column whose values are sums of values in another column based on certain conditions in r

Question

1 answers

solution1 0 2014-09-26 01:44:23

solution1
0 2014-09-26 01:44:23