[英]Multiply rows in 2 columns by grouping flagged rows in R
I have a dataframe DF, which has the following data; 我有一个数据框DF,其中包含以下数据; around 300000 rows
约300000行
<DF
A B C
1 2 0
2 5 0
4 5 2
4 7 0
7 8 0
9 7 -2
2 5 0
4 7 0
5 1 2
4 7 0
7 8 0
9 7 -2
2 5 0
4 7 0
5 1 2
I want to perform a mathematical operation on the data set with the following logic 我想使用以下逻辑对数据集执行数学运算
Select all rows till first occurrence of 2 in C(ignoring -2 in the middle)
选择所有行,直到C中第一次出现2(忽略中间的-2)
Compute avg of all these of (A*B) and add it column D (Implying all these rows in column D will have the same value)
计算所有(A * B)的平均值,并将其添加到D列(表示D列中的所有这些行将具有相同的值)
Select all rows from first occurrence of 2 till the second occurrence
选择从第一次出现的2到第二次出现的所有行
Compute avg of (A*B) for these rows and add it column D
为这些行计算(A * B)的平均值,并将其添加到列D
... Do the same till
...直到
Select all rows from last occurrence of 2 till the second last occurrence
选择从最后一次出现2到倒数第二次出现的所有行
Compute avg of (A*B) for these rows and add it column D
为这些行计算(A * B)的平均值,并将其添加到列D
The result should look like 结果应该看起来像
<Result
A B C D
1 2 0 6
2 5 0 6
4 5 2 34.16667
4 7 0 34.16667
7 8 0 34.16667
9 7 -2 34.16667
2 5 0 34.16667
4 7 0 34.16667
5 1 2 27.85714
4 7 0 27.85714
7 8 0 27.85714
9 7 -2 27.85714
2 5 0 27.85714
4 7 0 27.85714
5 1 2 NA
How to implement this logic in R? 如何在R中实现此逻辑? Thanks in advance!
提前致谢!
Using dplyr
, 使用
dplyr
,
library(dplyr)
df <- df %>%
mutate(ind = cumsum(C == 2)) %>%
group_by(ind) %>%
mutate(D = mean(A*B), D = replace(D, n() == 1, NA))
Here is an option with data.table
. 这是
data.table
一个选项。 Convert the 'data.frame' to 'data.table' ( setDT(DF)
), grouped by the cumulative sum of logical vector ( C==2
), we get the mean
of A * B
and multiply with the the values generated by NA^(.N==1)
( .N==1
returns a logical vector of TRUE/FALSE for number of rows that are equal to 1 or not and using NA^
converts this to NA/1) so that all groups that have only one element returns NA and others have the mean(A*B)
. 将'data.frame'转换为'data.table'(
setDT(DF)
),然后按逻辑向量的累积总和( C==2
)分组,我们得到A * B
的mean
,并与生成的值相乘由NA^(.N==1)
.N==1
返回的逻辑矢量TRUE / FALSE为等于1或不行数,并使用NA^
此转换为NA / 1),使得所有组仅具有一个元素的元素返回NA,而其他具有mean(A*B)
。
library(data.table)
setDT(DF)[, D := NA^(.N==1)*mean(A*B) , .(grp = cumsum(C==2))]
DF
# A B C D
# 1: 1 2 0 6.00000
# 2: 2 5 0 6.00000
# 3: 4 5 2 34.16667
# 4: 4 7 0 34.16667
# 5: 7 8 0 34.16667
# 6: 9 7 -2 34.16667
# 7: 2 5 0 34.16667
# 8: 4 7 0 34.16667
# 9: 5 1 2 31.66667
#10: 4 7 0 31.66667
#11: 7 8 0 31.66667
#12: 9 7 -2 31.66667
#13: 2 5 0 31.66667
#14: 4 7 0 31.66667
#15: 5 1 2 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.