![](/img/trans.png)
[英]Dplyr - choosing value in column based on lowest value in other column in R
[英]Collecting the lowest value of a column based on other columns in R
如何為另一列的每個級別收集一列的最小值,而新數據框按其他列分組?
這是一個示例數據集:
我想收集每個數字的最短時間,按A2和A3分組。
(在我的原始數據框中,每一列都有更多的級別)
df <- structure(list(ID=c('a','a','a','a','b','b','b','b','c','c','c','c','d','d','d','d','e','e','e','e','f','f','f','f','g','g','g','g','h','h','h','h'),
A2=c('d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2','d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2'),
A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2'),
number=c('1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2'),
time=c(23,345,123,4,434,76,245,34,135,98,45,678,32,134,76,578,32,145,256,79,311,356,67,12,689,467,98,456,23,45,23,34)),
class = "data.frame", row.names = c(NA,-32L))
結果如下所示:
df.result<-structure(list(ID=c('a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h'),
A2=c('d1','d1','d1','d1','d2','d2','d2','d2','d1','d1','d1','d1','d2','d2','d2','d2'),
A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2'),
number=c('1','2','1','2','1','2','1','2','1','2','1','2','1','2','1','2'),
time=c(23,4,76,34,98,45,32,76,32,79,311,12,467,98,23,23)),
class = "data.frame", row.names = c(NA,-16L))
謝謝
這似乎是一項aggregate
工作,首先更新number
:
transform(dat, number=with(rle(number), rep.int(seq_along(values), lengths))) |>
aggregate(time ~ number + A2 + A3, FUN=min)
# number A2 A3 time
# 1 1 d1 g1 234
# 2 2 d1 g1 12
# 3 3 d1 g1 232
# 4 4 d1 g1 44
# 5 5 d1 g1 21
# 6 6 d1 g1 34
# 7 13 d2 g1 345
# 8 14 d2 g1 34
# 9 15 d2 g1 56
# 10 16 d2 g1 98
# 11 7 d1 g2 23
# 12 8 d1 g2 12
# 13 9 d1 g2 689
# 14 10 d1 g2 4
# 15 11 d1 g2 43
# 16 12 d1 g2 21
# 17 17 d2 g2 245
# 18 18 d2 g2 134
# 19 19 d2 g2 567
# 20 20 d2 g2 1
嘗試這個:
library(data.table)
setDT(data)
data[, numberR := rleid(number)]
data[, min(time), by = .(A2, A3, numberR)]
這將與您預期的 output 完全匹配。
謝謝大家的回答:
此代碼有效(基於提供的代碼@Karsten W.)
df.result <- aggregate(df, time ~ ID + number + A2 + A3, FUN=min)
tidyverse 解決方案是:
df %>%
group_by(A2, A3, number) %>%
slice_min(time, n=1, with_ties = FALSE) %>%
ungroup()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.