[英]Identify most variable rows within multiple subsets of a data.frame and merge this information into a final data.frame
[英]Performing operations on multiple subsets within a data.frame (R)
给出以下样本数据集:
df<-data.frame(year=c("1990","1990","1990","1991","1991","1991","1992","1992","1992"), C2=LETTERS[1:3], C3=rnorm(1:9))
df
year C2 C3
1 1990 A -0.973627230
2 1990 B -0.755867016
3 1990 C 0.016505689
4 1991 A -0.004353502
5 1991 B 0.525895816
6 1991 C -0.882487930
7 1992 A -0.206509950
8 1992 B 0.192527583
9 1992 C 0.935712021
有没有办法可以为每个year
唯一值执行以下操作:
当C2 == B
和C2 == C
,加上C3
的值,然后当C2==A
时除以C3
的值。
因此,对于year
,我得到(B+C)/A
谢谢你的帮助。
您可以使用data.table 。 只需在解释时完全编写代码即可。
library(data.table)
setDT(df)[, sum(C3[C2 %in% c("B", "C")]) / C3[C2 == "A"], by = year]
# year V1
# 1: 1990 -0.08157762
# 2: 1991 4.44625385
# 3: 1992 13.03606921
如果dplyr是你的包,那么这里是在dplyr完成的:
library(dplyr)
group_by(df, year) %>%
summarise(out = sum(C3[C2 %in% c("B", "C")]) / C3[C2 == "A"])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.