
[英]subset dataframe by spliting columns and selecting minimum and maximum value
[英]Subset using minimum number of values positioned around maximum value
情况:我有一个由不同记录器收集的数据集列表,如下所示:
df <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "logger1", class = "factor"), OriginalTraitValue = c(0.37968,
0.455131, 0.606376, 0.910194, 1.19499, 1.55612, 1.91735, 2.35493,
2.60147, 2.42803, 1.66277, 1.12656, 0.628537), Temp = c(11.7334,
14.627, 19.3428, 24.5959, 29.6344, 34.7809, 39.606, 44.5389,
49.7914, 54.8254, 59.6391, 64.6695, 69.7002)), class = "data.frame", row.names = c(NA,
-13L))
任务:我只想保留在max(OriginalTraitValue)
之前和之后至少有两个记录的Temp
值的数据集。
我希望这个情节可以使它更清晰。 红色=最大值,绿色=保留数据集所需的值。
我如何在R中执行此操作,例如使用dplyr
?
我已经设法使用df$Temp[df$OriginalTraitValue == max(df$OriginalTraitValue)]
来标识与max(OriginalTraitValue)
对应的Temp
值,但是我正在努力寻找必要的位置参数来过滤数据集。
上面的示例表示我要保留的数据集。 完整的数据集如下所示:
df <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("logger1", "logger2", "logger3"
), class = "factor"), OriginalTraitValue = c(3.36e-11, 3.68e-11,
5.12e-11, 6.24e-11, 6.72e-11, 8.64e-11, 1.04e-10, 1.1e-10, 1.18e-10,
90.34189, 86.332214, 108.00114, 111.190155, 114.34427, 135.1673,
139.18198, 142.76979, 145.09233, 0.002, 0.06, 0.07, 0.15, 0.17,
0.17, 0.18, 0.18, 0.15, 0.07, 0.09), Temp = c(16, 18, 20, 22,
24, 26, 28, 30, 32, 16.726307, 17.376368, 20.193129, 25.06135,
25.060663, 29.875113, 29.924177, 30.422773, 34.417274, 10, 12.5,
15, 18, 20, 22.5, 25, 27.5, 30, 32.5, 35)), class = "data.frame", row.names = c(NA,
-29L))
> summary(df)
ID OriginalTraitValue Temp
logger1: 9 Min. : 0.00 Min. :10.00
logger2: 9 1st Qu.: 0.00 1st Qu.:18.00
logger3:11 Median : 0.15 Median :25.00
Mean : 37.02 Mean :23.90
3rd Qu.: 90.34 3rd Qu.:29.92
Max. :145.09 Max. :35.00
在此数据集中,我仅将ID
保留为logger3
,因为只有logger3
在max(OriginalTraitValue)
之前和之后至少包含2个值。
尝试:
library(dplyr)
df %>%
group_by(ID) %>%
slice(which.max(OriginalTraitValue) + -2:2) %>%
filter(n() == 5)
输出:
# A tibble: 5 x 3
# Groups: ID [1]
ID OriginalTraitValue Temp
<fct> <dbl> <dbl>
1 logger1 1.92 39.6
2 logger1 2.35 44.5
3 logger1 2.60 49.8
4 logger1 2.43 54.8
5 logger1 1.66 59.6
如果您想过滤整个群组,而不仅是对问题中的5个观察值进行过滤,您还可以执行以下操作:
df %>%
group_by(ID) %>%
filter(any(cumsum(row_number() %in% c(which.max(OriginalTraitValue) + -2:2)) == 5))
您可以使用dplyr filter
来实现
df %>%
group_by(ID) %>%
filter(abs(which(OriginalTraitValue == max(OriginalTraitValue)) - row_number()) <= 2)
ID OriginalTraitValue Temp
<fct> <dbl> <dbl>
1 logger1 1.92 39.6
2 logger1 2.35 44.5
3 logger1 2.60 49.8
4 logger1 2.43 54.8
5 logger1 1.66 59.6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.