简体   繁体   English

通过将R中的标记行分组来将2列中的行相乘

[英]Multiply rows in 2 columns by grouping flagged rows in R

I have a dataframe DF, which has the following data; 我有一个数据框DF,其中包含以下数据; around 300000 rows 约300000行

<DF
A B C
1 2 0
2 5 0
4 5 2
4 7 0
7 8 0
9 7 -2
2 5 0
4 7 0
5 1 2
4 7 0
7 8 0
9 7 -2
2 5 0
4 7 0
5 1 2

I want to perform a mathematical operation on the data set with the following logic 我想使用以下逻辑对数据集执行数学运算

Select all rows till first occurrence of 2 in C(ignoring -2 in the middle) 选择所有行,直到C中第一次出现2(忽略中间的-2)

Compute avg of all these of (A*B) and add it column D (Implying all these rows in column D will have the same value) 计算所有(A * B)的平均值,并将其添加到D列(表示D列中的所有这些行将具有相同的值)

Select all rows from first occurrence of 2 till the second occurrence 选择从第一次出现的2到第二次出现的所有行

Compute avg of (A*B) for these rows and add it column D 为这些行计算(A * B)的平均值,并将其添加到列D

... Do the same till ...直到

Select all rows from last occurrence of 2 till the second last occurrence 选择从最后一次出现2到倒数第二次出现的所有行

Compute avg of (A*B) for these rows and add it column D 为这些行计算(A * B)的平均值,并将其添加到列D

The result should look like 结果应该看起来像

<Result
A B C D
1 2 0 6
2 5 0 6
4 5 2 34.16667
4 7 0 34.16667
7 8 0 34.16667
9 7 -2 34.16667
2 5 0 34.16667
4 7 0 34.16667
5 1 2 27.85714
4 7 0 27.85714
7 8 0 27.85714
9 7 -2 27.85714
2 5 0 27.85714
4 7 0 27.85714
5 1 2 NA

How to implement this logic in R? 如何在R中实现此逻辑? Thanks in advance! 提前致谢!

Using dplyr , 使用dplyr

library(dplyr)
df <- df %>% 
        mutate(ind = cumsum(C == 2)) %>% 
        group_by(ind) %>% 
        mutate(D = mean(A*B), D = replace(D, n() == 1, NA))

Here is an option with data.table . 这是data.table一个选项。 Convert the 'data.frame' to 'data.table' ( setDT(DF) ), grouped by the cumulative sum of logical vector ( C==2 ), we get the mean of A * B and multiply with the the values generated by NA^(.N==1) ( .N==1 returns a logical vector of TRUE/FALSE for number of rows that are equal to 1 or not and using NA^ converts this to NA/1) so that all groups that have only one element returns NA and others have the mean(A*B) . 将'data.frame'转换为'data.table'( setDT(DF) ),然后按逻辑向量的累积总和( C==2 )分组,我们得到A * Bmean ,并与生成的值相乘由NA^(.N==1) .N==1返回的逻辑矢量TRUE / FALSE为等于1或不行数,并使用NA^此转换为NA / 1),使得所有组仅具有一个元素的元素返回NA,而其他具有mean(A*B)

library(data.table)
setDT(DF)[,  D := NA^(.N==1)*mean(A*B) , .(grp = cumsum(C==2))]
DF
#    A B  C        D
# 1: 1 2  0  6.00000
# 2: 2 5  0  6.00000
# 3: 4 5  2 34.16667
# 4: 4 7  0 34.16667
# 5: 7 8  0 34.16667
# 6: 9 7 -2 34.16667
# 7: 2 5  0 34.16667
# 8: 4 7  0 34.16667
# 9: 5 1  2 31.66667
#10: 4 7  0 31.66667
#11: 7 8  0 31.66667
#12: 9 7 -2 31.66667
#13: 2 5  0 31.66667
#14: 4 7  0 31.66667
#15: 5 1  2       NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM