[英]Use multiquantile groups from a large dataframe in a grouped dataframe in R
我有下一個問題,我有一個大數據框,我必須從一個變量中提取分位數,但按組,例如:
list_q <- list()
for (i in 3:5){
tmp <- mtcars %>%
filter(gear == i) %>%
pull(mpg) %>%
quantile(probs = seq(0, 1, 0.25), na.rm = TRUE)
list_q[[i]] <- tmp
}
list_q
有了這個輸出:
[[3]]
0% 25% 50% 75% 100%
10.4 14.5 15.5 18.4 21.5
[[4]]
0% 25% 50% 75% 100%
17.800 21.000 22.800 28.075 33.900
[[5]]
0% 25% 50% 75% 100%
15.0 15.8 19.7 26.0 30.4
現在,我需要對變量均值進行分組並確定它屬於哪個分位數,但使用原始度量:
a <- mtcars %>%
group_by(gear, carb) %>%
summarize(mpg_mean = mean(mpg)) %>%
ungroup()
gear carb mpg_mean
<dbl> <dbl> <dbl>
1 3 1 20.3
2 3 2 17.2
3 3 3 16.3
4 3 4 12.6
5 4 1 29.1
6 4 2 24.8
7 4 4 19.8
8 5 2 28.2
9 5 4 15.8
10 5 6 19.7
11 5 8 15
所以我可以這樣做:
g3 <- a %>%
filter(gear == 3) %>%
mutate(quantile = cut(mpg_mean, list_q[[3]], labels = FALSE, include.lowest = TRUE))
g4 <- a %>%
filter(gear == 4) %>%
mutate(quantile = cut(mpg_mean, list_q[[4]], labels = FALSE, include.lowest = TRUE))
g5 <- a %>%
filter(gear == 5) %>%
mutate(quantile = cut(mpg_mean, list_q[[5]], labels = FALSE, include.lowest = TRUE))
bind_rows(g3, g4, g5)
獲得:
# A tibble: 11 x 4
gear carb mpg_mean quantile
<dbl> <dbl> <dbl> <int>
1 3 1 20.3 4
2 3 2 17.2 3
3 3 3 16.3 3
4 3 4 12.6 1
5 4 1 29.1 4
6 4 2 24.8 3
7 4 4 19.8 1
8 5 2 28.2 4
9 5 4 15.8 1
10 5 6 19.7 2
11 5 8 15 1
我想知道是否有辦法更有效地做到這一點
我們可以首先group_by
gear
並將mpg
的分位數存儲在列表中。 然后我們還可以group_by
carb
來獲得mpg
值的mean
,並使用之前存儲在列表中的分位數來cut
mpg
列的平均值。
library(dplyr)
mtcars %>%
group_by(gear) %>%
mutate(gear_q = list(quantile(mpg))) %>%
group_by(carb, add = TRUE) %>%
summarize(mpg_mean = mean(mpg),
gear_q = list(first(gear_q))) %>%
mutate(quantile = cut(mpg_mean, first(gear_q),
labels = FALSE, include.lowest = TRUE)) %>%
select(-gear_q)
# gear carb mpg_mean quantile
# <dbl> <dbl> <dbl> <int>
# 1 3 1 20.3 4
# 2 3 2 17.2 3
# 3 3 3 16.3 3
# 4 3 4 12.6 1
# 5 4 1 29.1 4
# 6 4 2 24.8 3
# 7 4 4 19.8 1
# 8 5 2 28.2 4
# 9 5 4 15.8 1
#10 5 6 19.7 2
#11 5 8 15 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.