簡體   English   中英

根據R中的其他列收集列的最小值

[英]Collecting the lowest value of a column based on other columns in R

如何為另一列的每個級別收集一列的最小值,而新數據框按其他列分組?

這是一個示例數據集:

我想收集每個數字的最短時間,按A2A3分組。

(在我的原始數據框中,每一列都有更多的級別)

df <- structure(list(ID=c('a','a','a','a','b','b','b','b','c','c','c','c','d','d','d','d','e','e','e','e','f','f','f','f','g','g','g','g','h','h','h','h'),
                     A2=c('d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2','d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2'),
                     A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2'),
                     number=c('1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2'),
                     time=c(23,345,123,4,434,76,245,34,135,98,45,678,32,134,76,578,32,145,256,79,311,356,67,12,689,467,98,456,23,45,23,34)), 
                class = "data.frame", row.names = c(NA,-32L))

結果如下所示:

df.result<-structure(list(ID=c('a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h'),
                          A2=c('d1','d1','d1','d1','d2','d2','d2','d2','d1','d1','d1','d1','d2','d2','d2','d2'),
                          A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2'),
                          number=c('1','2','1','2','1','2','1','2','1','2','1','2','1','2','1','2'),
                          time=c(23,4,76,34,98,45,32,76,32,79,311,12,467,98,23,23)), 
                     class = "data.frame", row.names = c(NA,-16L))

謝謝

這似乎是一項aggregate工作,首先更新number

transform(dat, number=with(rle(number), rep.int(seq_along(values), lengths))) |>
  aggregate(time ~ number + A2 + A3, FUN=min)
#    number A2 A3 time
# 1       1 d1 g1  234
# 2       2 d1 g1   12
# 3       3 d1 g1  232
# 4       4 d1 g1   44
# 5       5 d1 g1   21
# 6       6 d1 g1   34
# 7      13 d2 g1  345
# 8      14 d2 g1   34
# 9      15 d2 g1   56
# 10     16 d2 g1   98
# 11      7 d1 g2   23
# 12      8 d1 g2   12
# 13      9 d1 g2  689
# 14     10 d1 g2    4
# 15     11 d1 g2   43
# 16     12 d1 g2   21
# 17     17 d2 g2  245
# 18     18 d2 g2  134
# 19     19 d2 g2  567
# 20     20 d2 g2    1

嘗試這個:

library(data.table)

setDT(data)
data[, numberR := rleid(number)]
data[, min(time), by = .(A2, A3, numberR)]

這將與您預期的 output 完全匹配。

謝謝大家的回答:

此代碼有效(基於提供的代碼@Karsten W.)

df.result <- aggregate(df, time ~ ID + number + A2 + A3, FUN=min)

tidyverse 解決方案是:

df %>%
 group_by(A2, A3, number) %>%
 slice_min(time, n=1, with_ties = FALSE) %>%
 ungroup()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM