基於標識列中的最大值和最小值（在R中）來子集數據框

Question

對於示例數據框：

df1 <- structure(list(id = 1:21, region = structure(c(1L, 1L, 1L, 1L, 
                                                  2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
                                                  4L), .Label = c("a", "b", "c", "d"), class = "factor"), weight = c(0.35, 
                                                                                                                     0.65, 0.99, 1.5, 3.2, 2.1, 1.3, 3.2, 1.3, 2, 0.6, 0.6, 0.6, 0.45, 
                                                                                                                     1, 1.2, 1.4, 2, 1.3, 1, 2), condition = c(0L, 1L, 0L, 1L, 0L, 
                                                                                                                                                               0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
                                                                                                                     )), .Names = c("id", "region", "weight", "condition"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                -21L))

我希望按區域排除結果變量中沒有最大1或最小1的區域。 例如，我通常會這樣做：

summary <- setDT(df)[,.(.result = weighted.mean((condition==1),
       w = weight)*100), by = region]

這會給我：總結

   region  .result
1:      a 61.60458
2:      b 39.69466
3:      c 50.56180
4:      d 61.03896

因此，我將從數據幀df中划分區域c和d。

是否可以一步執行此操作而不必手動查看摘要數據框？

Answer 1

我的理解是，您希望排除所有不是最高值和最低值的值。 它不能作為一個單一的班輪完成，但是如果添加以下內容，那么您應該得到想要的：

incl <- summary[c(which.min(.result), which.max(.result)),region]
newdf <- df1[region %in% incl,]
newdf

   id region weight condition
 1:  5      b   3.20         0
 2:  6      b   2.10         0
 3:  7      b   1.30         0
 4:  8      b   3.20         1
 5:  9      b   1.30         0
 6: 10      b   2.00         1
 7:  1      a   0.35         0
 8:  2      a   0.65         1
 9:  3      a   0.99         0
10:  4      a   1.50         1

基於標識列中的最大值和最小值（在R中）來子集數據框

問題描述

1 個解決方案

解決方案1
3 已采納 2016-02-12 11:52:48

基於標識列中的最大值和最小值（在R中）來子集數據框

問題描述

1 個解決方案

解決方案1 3 已采納 2016-02-12 11:52:48

解決方案1
3 已采納 2016-02-12 11:52:48