[英]Summary means using dplyr
我正在嘗試生成一個表格,其中包含整個數據集的總計和均值,然后按子類別 (f_grp),並按站點顯示。
我可以使用group_by
function 進行分組,這對於報告 total_count 和 Mean_per_litre 效果很好,但我希望每個類別的值都相同,如 f_grp 所示。
|站點 |total_count |Mean_per_litre
|1 |66 |3.33333333
|2 |77 |4.27777778
|3 |65 |3.38541667
|4 |154 |8.85057471
ETC
我已經為 site 和 f_grp 嘗試了 group_by 但這不太正確
|網站 |f_grp |total_count |mean_per_litre
|1 |1c |3 |1.666667
|1 |1d |15 |4.166667
|1 |2a |1 |1.666667
|1 |2b |47 |11.190476
這不太正確,因為它不容易閱讀,而且我現在已經丟失了第一個表中的原始總列(抱歉這些表,無法讓它們在這里工作)。
dat$site=as.factor(dat$site)
dat$count=as.numeric(dat$count)
dat$f_grp=as.factor(dat$f_grp)
# totals across all f_grp
tabl1 <- dat %>%
group_by(site) %>%
summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl1
# totals FG 1b
tabl2 <- dat %>%
group_by(site) %>%
filter(f_grp== '1b') %>%
summarise ('1b_total_count' = sum(count))
tabl2
### BUT - this doesnt give a correct mean, as it only shows the mean of '1b' when only '1b' is present. I need a mean over the entire dataset at that site.
# table showing totals across whole dataset
tabl7 <- dat %>%
summarise (total_count = sum(count, na.rm = TRUE), Total_mean_per_litre = mean(count_l_site, na.rm = TRUE))
tabl7
# table with means for each site by fg
table6 <- dat %>%
group_by(site, f_grp) %>%
summarise (total_count = sum(count), mean_per_litre = mean(count_l_site, na.rm = TRUE))
table6
理想情況下,我需要一種方法來提取 f-grp 類別,將它們作為列標題,然后按站點匯總這些類別的方法。 但是過濾數據然后連接多個表,給出了不正確的方法(不是整個數據集的意思,而是該類別的一個子集,即:當 f_grp 值僅存在時)。
非常感謝所有讀到這里的人:)
> dput(head(dat))
structure(list(X = 1:6, site = structure(c(1L, 10L, 11L, 12L,
13L, 14L), levels = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18"), class = "factor"),
count = c(0, 0, 0, 0, 0, 0), f_grp = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), levels = c("1b", "1c", "1d", "2a", "2b"), class = "factor"),
count_l_site = c(0, 0, 0, 0, 0, 0)), row.names = c(NA, 6L
), class = "data.frame")
更新:
遵循 Jon 的建議,並使用 mtcars 數據(按預期工作),我使用我自己的數據嘗試了相同的方法。
我幾乎可以生成所需的內容,但是當需要將總數作為一列顯示時,它們會作為一行顯示。
tabl1 <- dat %>%
group_by(site) %>%
summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site)) %>%
mutate(fg = "total")
tabl1
tabl2_fg <- dat %>%
group_by(site, f_grp = as.character(f_grp)) %>%
summarize(total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl2_fg
tabl4 <-
bind_rows(tabl1, tabl2_fg) %>%
arrange(site, f_grp) %>%
tidyr::pivot_wider(names_from = f_grp, values_from = c(Mean_per_litre, total_count), names_vary = "slowest")
tabl4
Output如下
后續步驟:移動帶圓圈的輸出並將它們放在表格的開頭刪除每隔一行結果 - 留下一個簡單的表格 rows = sites; 列:總數; 總均值; 然后是每個 fg 計數和平均值的列:例如 1c 計數; 1c 均值; 1d 計數; 1d 均值。
是這樣的嗎?
library(dplyr)
avg_gear <- mtcars %>%
group_by(gear) %>%
summarize(avg_mpg = mean(mpg), n = n()) %>%
mutate(cyl = "total")
avg_gear_cyl <- mtcars %>%
group_by(gear,cyl = as.character(cyl)) %>%
summarize(avg_mpg = mean(mpg), n = n())
bind_rows(avg_gear, avg_gear_cyl) %>%
arrange(gear, cyl)
# A tibble: 11 × 4
gear avg_mpg n cyl
<dbl> <dbl> <int> <chr>
1 3 21.5 1 4
2 3 19.8 2 6
3 3 15.0 12 8
4 3 16.1 15 total
5 4 26.9 8 4
6 4 19.8 4 6
7 4 24.5 12 total
8 5 28.2 2 4
9 5 19.7 1 6
10 5 15.4 2 8
11 5 21.4 5 total
或者,如果您希望類別作為列:
bind_rows(avg_gear, avg_gear_cyl) %>%
arrange(gear, desc(cyl)) %>%
tidyr::pivot_wider(names_from = cyl, values_from = c(avg_mpg, n), names_vary = "slowest")
# A tibble: 3 × 9
gear avg_mpg_total n_total avg_mpg_8 n_8 avg_mpg_6 n_6 avg_mpg_4 n_4
<dbl> <dbl> <int> <dbl> <int> <dbl> <int> <dbl> <int>
1 3 16.1 15 15.0 12 19.8 2 21.5 1
2 4 24.5 12 NA NA 19.8 4 26.9 8
3 5 21.4 5 15.4 2 19.7 1 28.2 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.