![](/img/trans.png)
[英]R: Efficiently extract rows with different element in specified column by group in data.table
[英]Group efficiently with data.table in R
可以縮寫以下腳本:
library(data.table)
DT<-structure(list(title = c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c", "d", "d", "d", "d"), date = c("12-07-2020", "13-07-2020", "14-07-2020", "15-07-2020", "12-07-2020", "13-07-2020",
"14-07-2020", "15-07-2020", "12-07-2020", "13-07-2020", "14-07-2020", "15-07-2020",
"12-07-2020", "13-07-2020", "14-07-2020", "15-07-2020"),
bucket = c(1, 1, 1, 4, 9, 7, 10, 10, 8, 5, 5, 5, 8, 10, 9, 10),
score = c(86, 22, 24, 54, 66, 76, 43, 97, 9, 53, 45, 40, 21, 99, 91, 90)),
row.names = c(NA, -16L), class = c("data.table","data.frame"))
DT[DT[, .I[bucket == min(bucket)], by = title]$V1]
DT[, .SD[which(bucket == min(bucket))], by =title][,
`:=`(avg_score = mean(score)), by = .(title)][,
.SD[.N,c(1,2,4)], by = .(title)]
原始代碼是使用 dplyr 的腳本。: RStudio 社區
tt <- data %>%
group_by(title) %>%
filter(bucket == min(bucket)) %>%
mutate(avg_score = mean(score)) %>%
slice_max(date) %>%
select(-score)
>
title date bucket avg_score
<chr> <chr> <dbl> <dbl>
1 a 14-07-2020 1 44
2 b 13-07-2020 7 76
3 c 15-07-2020 5 46
4 d 12-07-2020 8 21
>
這是一個沒有鏈接也沒有.SD
的解決方案:
# Convert from character to Date to be able to select the max
DT[, date := as.Date(date, "%d-%m-%Y")]
DT[,
{
mb <- which(bucket == min(bucket))
.(
date = max(date[mb]), bucket = bucket[mb][1L], avg_score = mean(score[mb])
)
},
by = title]
# title date bucket avg_score
# 1: a 2020-07-14 1 44
# 2: b 2020-07-13 7 76
# 3: c 2020-07-15 5 46
# 4: d 2020-07-12 8 21
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.