通过将R中的标记行分组来将2列中的行相乘

Question

I have a dataframe DF, which has the following data; 我有一个数据框DF，其中包含以下数据； around 300000 rows 约300000行

I want to perform a mathematical operation on the data set with the following logic 我想使用以下逻辑对数据集执行数学运算

Select all rows till first occurrence of 2 in C(ignoring -2 in the middle) 选择所有行，直到C中第一次出现2（忽略中间的-2）

Compute avg of all these of (A*B) and add it column D (Implying all these rows in column D will have the same value) 计算所有（A * B）的平均值，并将其添加到D列（表示D列中的所有这些行将具有相同的值）

Select all rows from first occurrence of 2 till the second occurrence 选择从第一次出现的2到第二次出现的所有行

Compute avg of (A*B) for these rows and add it column D 为这些行计算（A * B）的平均值，并将其添加到列D

... Do the same till ...直到

Select all rows from last occurrence of 2 till the second last occurrence 选择从最后一次出现2到倒数第二次出现的所有行

Compute avg of (A*B) for these rows and add it column D 为这些行计算（A * B）的平均值，并将其添加到列D

The result should look like 结果应该看起来像

<Result
A B C D
1 2 0 6
2 5 0 6
4 5 2 34.16667
4 7 0 34.16667
7 8 0 34.16667
9 7 -2 34.16667
2 5 0 34.16667
4 7 0 34.16667
5 1 2 27.85714
4 7 0 27.85714
7 8 0 27.85714
9 7 -2 27.85714
2 5 0 27.85714
4 7 0 27.85714
5 1 2 NA

How to implement this logic in R? 如何在R中实现此逻辑？ Thanks in advance! 提前致谢！

Answer 1

Using dplyr , 使用dplyr ，

library(dplyr)
df <- df %>% 
        mutate(ind = cumsum(C == 2)) %>% 
        group_by(ind) %>% 
        mutate(D = mean(A*B), D = replace(D, n() == 1, NA))

Answer 2

Here is an option with data.table . 这是data.table一个选项。 Convert the 'data.frame' to 'data.table' ( setDT(DF) ), grouped by the cumulative sum of logical vector ( C==2 ), we get the mean of A * B and multiply with the the values generated by NA^(.N==1) ( .N==1 returns a logical vector of TRUE/FALSE for number of rows that are equal to 1 or not and using NA^ converts this to NA/1) so that all groups that have only one element returns NA and others have the mean(A*B) . 将'data.frame'转换为'data.table'（ setDT(DF) ），然后按逻辑向量的累积总和（ C==2 ）分组，我们得到A * B的mean ，并与生成的值相乘由NA^(.N==1) .N==1返回的逻辑矢量TRUE / FALSE为等于1或不行数，并使用NA^此转换为NA / 1），使得所有组仅具有一个元素的元素返回NA，而其他具有mean(A*B) 。

library(data.table)
setDT(DF)[,  D := NA^(.N==1)*mean(A*B) , .(grp = cumsum(C==2))]
DF
#    A B  C        D
# 1: 1 2  0  6.00000
# 2: 2 5  0  6.00000
# 3: 4 5  2 34.16667
# 4: 4 7  0 34.16667
# 5: 7 8  0 34.16667
# 6: 9 7 -2 34.16667
# 7: 2 5  0 34.16667
# 8: 4 7  0 34.16667
# 9: 5 1  2 31.66667
#10: 4 7  0 31.66667
#11: 7 8  0 31.66667
#12: 9 7 -2 31.66667
#13: 2 5  0 31.66667
#14: 4 7  0 31.66667
#15: 5 1  2       NA

通过将R中的标记行分组来将2列中的行相乘

问题描述

2 个解决方案

解决方案1
1 2016-07-25 13:09:41

解决方案2
1 已采纳 2016-07-25 15:16:49

通过将R中的标记行分组来将2列中的行相乘

问题描述

2 个解决方案

解决方案1 1 2016-07-25 13:09:41

解决方案2 1 已采纳 2016-07-25 15:16:49

解决方案1
1 2016-07-25 13:09:41

解决方案2
1 已采纳 2016-07-25 15:16:49