[英]Subset a dataframe based on identifying max and min values in a column (in R)
For a sample dataframe: 对于示例数据框:
df1 <- structure(list(id = 1:21, region = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L), .Label = c("a", "b", "c", "d"), class = "factor"), weight = c(0.35,
0.65, 0.99, 1.5, 3.2, 2.1, 1.3, 3.2, 1.3, 2, 0.6, 0.6, 0.6, 0.45,
1, 1.2, 1.4, 2, 1.3, 1, 2), condition = c(0L, 1L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
)), .Names = c("id", "region", "weight", "condition"), class = "data.frame", row.names = c(NA,
-21L))
I wish to exclude the regions which do not have either the highest or lowest number of 1s in the result variable by region. 我希望按区域排除结果变量中没有最大1或最小1的区域。 For example, I would normally do:
例如,我通常会这样做:
summary <- setDT(df)[,.(.result = weighted.mean((condition==1),
w = weight)*100), by = region]
Which would give me: summary 这会给我:总结
region .result
1: a 61.60458
2: b 39.69466
3: c 50.56180
4: d 61.03896
Therefore I would subset regions c and d from the dataframe df. 因此,我将从数据帧df中划分区域c和d。
Is it possible to do this in one step without having to manually look at a summary dataframe? 是否可以一步执行此操作而不必手动查看摘要数据框?
My understanding is that you wish to exclude all values that are not the highest and lowest values. 我的理解是,您希望排除所有不是最高值和最低值的值。 It can't be done as a one liner, but if you add the following, you should get what you want:
它不能作为一个单一的班轮完成,但是如果添加以下内容,那么您应该得到想要的:
incl <- summary[c(which.min(.result), which.max(.result)),region]
newdf <- df1[region %in% incl,]
newdf
id region weight condition
1: 5 b 3.20 0
2: 6 b 2.10 0
3: 7 b 1.30 0
4: 8 b 3.20 1
5: 9 b 1.30 0
6: 10 b 2.00 1
7: 1 a 0.35 0
8: 2 a 0.65 1
9: 3 a 0.99 0
10: 4 a 1.50 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.