[英]Compute grouped averages across varying numbers of columns
我有一個(非常大的) w
, f
包含不同size
的話語中的單詞和單詞的語料庫頻率:
df <- structure(list(size = c(2, 2, 3, 3, 4, 4, 3, 3),
w1 = c("come", "why", "er", "well", "i", "no", "that", "cos"),
w2 = c("on","that", "i", "not", "'m", "thanks", "'s", "she"),
w3 = c(NA, NA, "can", "today", "going", "a", "cool", "does"),
w4 = c(NA,NA, NA, NA, "home", "lot", NA, NA),
f1 = c(9699L, 6519L, 21345L, 35793L, 169024L, 39491L, 84682L, 11375L),
f2 = c(33821L, 84682L,169024L, 21362L, 14016L, 738L, 107729L, 33737L),
f3 = c(NA, NA, 15428L, 2419L, 10385L, 77328L, 132L, 7801L),
f4 = c(NA, NA, NA, NA, 2714L, 3996L, NA, NA)),
row.names = c(NA, -8L), class = "data.frame")
我需要計算不同列數的不同size
小組的平均值。 我可以按size
來做,就像這樣,例如size
size == 2
:
# calculate numbers of rows per size group:
RowsPerSize <- table(df$size)
# make size subset:
df_size2 <- df[df$size == 2,]
# calculate average `f`requencies per `size`:
AvFreqSize_2 <- apply(df_size2[,6:7], 2, function(x) sum(x, na.rm = T)/RowsPerSize[1])
# result:
AvFreqSize_2
f1 f2
8109.0 59251.5
但這對於單個size
來說已經很麻煩了,對於多個size
來說更是如此。 我很確定有一種更經濟的方式,可能在dplyr
中,您可以在其中group_by
。 一個不起眼的開始是這樣的:
df %>%
group_by(size) %>%
summarise(freq = n())
# A tibble: 3 x 2
size freq
* <dbl> <int>
1 2 2
2 3 4
3 4 2
我不得不猜測很多,但我認為您正在尋找這個:
library(tidyverse)
df %>%
group_by(size) %>%
summarise(across(matches("f\\d"), ~sum(.x, na.rm = T)/n()))
#> # A tibble: 3 x 5
#> size f1 f2 f3 f4
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 8109 59252. 0 0
#> 2 3 38299. 82963 6445 0
#> 3 4 104258. 7377 43856. 3355
#as @Onyambu suggested, it could make more sense to use `mean()`
df %>%
group_by(size) %>%
summarise(across(matches("f\\d"), ~mean(.x, na.rm = T)))
#> # A tibble: 3 x 5
#> size f1 f2 f3 f4
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 8109 59252. NaN NaN
#> 2 3 38299. 82963 6445 NaN
#> 3 4 104258. 7377 43856. 3355
由reprex package (v2.0.0) 於 2021 年 5 月 6 日創建
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.