總結意味着使用dplyr

Question

我正在嘗試生成一個表格，其中包含整個數據集的總計和均值，然后按子類別 (f_grp)，並按站點顯示。

我可以使用group_by function 進行分組，這對於報告 total_count 和 Mean_per_litre 效果很好，但我希望每個類別的值都相同，如 f_grp 所示。

|站點 |total_count |Mean_per_litre

|1 |66 |3.33333333
|2 |77 |4.27777778
|3 |65 |3.38541667
|4 |154 |8.85057471
ETC

我已經為 site 和 f_grp 嘗試了 group_by 但這不太正確

|網站 |f_grp |total_count |mean_per_litre

|1 |1c |3 |1.666667
|1 |1d |15 |4.166667
|1 |2a |1 |1.666667
|1 |2b |47 |11.190476

這不太正確，因為它不容易閱讀，而且我現在已經丟失了第一個表中的原始總列（抱歉這些表，無法讓它們在這里工作）。

dat$site=as.factor(dat$site)
dat$count=as.numeric(dat$count)
dat$f_grp=as.factor(dat$f_grp)
  
 

# totals across all f_grp
tabl1 <- dat %>%
  group_by(site) %>%
  summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl1

# totals FG 1b
tabl2 <- dat %>%
  group_by(site) %>%
  filter(f_grp== '1b') %>%
  summarise ('1b_total_count' = sum(count))
tabl2

### BUT - this doesnt give a correct mean, as it only shows the mean of '1b' when only '1b' is present. I need a mean over the entire dataset at that site.


# table showing totals across whole dataset
tabl7 <- dat %>%
  summarise (total_count = sum(count, na.rm = TRUE), Total_mean_per_litre = mean(count_l_site, na.rm = TRUE))
tabl7

# table with means for each site by fg

table6 <- dat %>%
  group_by(site, f_grp) %>%
  summarise (total_count = sum(count), mean_per_litre = mean(count_l_site, na.rm = TRUE))

table6

理想情況下，我需要一種方法來提取 f-grp 類別，將它們作為列標題，然后按站點匯總這些類別的方法。 但是過濾數據然后連接多個表，給出了不正確的方法（不是整個數據集的意思，而是該類別的一個子集，即：當 f_grp 值僅存在時）。

非常感謝所有讀到這里的人：）

> dput(head(dat))
structure(list(X = 1:6, site = structure(c(1L, 10L, 11L, 12L, 
13L, 14L), levels = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18"), class = "factor"), 
    count = c(0, 0, 0, 0, 0, 0), f_grp = structure(c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    ), levels = c("1b", "1c", "1d", "2a", "2b"), class = "factor"), 
    count_l_site = c(0, 0, 0, 0, 0, 0)), row.names = c(NA, 6L
), class = "data.frame")

更新：

遵循 Jon 的建議，並使用 mtcars 數據（按預期工作），我使用我自己的數據嘗試了相同的方法。

我幾乎可以生成所需的內容，但是當需要將總數作為一列顯示時，它們會作為一行顯示。

tabl1 <- dat %>%
  group_by(site) %>%
  summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site)) %>%
  mutate(fg = "total")

tabl1

tabl2_fg <- dat %>%
  group_by(site, f_grp = as.character(f_grp)) %>%
  summarize(total_count = sum(count), Mean_per_litre = mean(count_l_site))

tabl2_fg

tabl4 <-
  bind_rows(tabl1, tabl2_fg) %>%
  arrange(site, f_grp) %>%
  tidyr::pivot_wider(names_from = f_grp, values_from = c(Mean_per_litre, total_count), names_vary = "slowest")

tabl4

Output如下

后續步驟：移動帶圓圈的輸出並將它們放在表格的開頭刪除每隔一行結果 - 留下一個簡單的表格 rows = sites; 列：總數； 總均值； 然后是每個 fg 計數和平均值的列：例如 1c 計數； 1c 均值； 1d 計數； 1d 均值。

Answer 1

是這樣的嗎？

library(dplyr)

avg_gear <- mtcars %>%
  group_by(gear) %>%
  summarize(avg_mpg = mean(mpg), n = n()) %>%
  mutate(cyl = "total")

avg_gear_cyl <- mtcars %>%
  group_by(gear,cyl = as.character(cyl)) %>%
  summarize(avg_mpg = mean(mpg), n = n())

bind_rows(avg_gear, avg_gear_cyl) %>%
  arrange(gear, cyl)

# A tibble: 11 × 4
    gear avg_mpg     n cyl  
   <dbl>   <dbl> <int> <chr>
 1     3    21.5     1 4    
 2     3    19.8     2 6    
 3     3    15.0    12 8    
 4     3    16.1    15 total
 5     4    26.9     8 4    
 6     4    19.8     4 6    
 7     4    24.5    12 total
 8     5    28.2     2 4    
 9     5    19.7     1 6    
10     5    15.4     2 8    
11     5    21.4     5 total

或者，如果您希望類別作為列：

bind_rows(avg_gear, avg_gear_cyl) %>%
  arrange(gear, desc(cyl)) %>%
  tidyr::pivot_wider(names_from = cyl, values_from = c(avg_mpg, n), names_vary = "slowest")

# A tibble: 3 × 9
   gear avg_mpg_total n_total avg_mpg_8   n_8 avg_mpg_6   n_6 avg_mpg_4   n_4
  <dbl>         <dbl>   <int>     <dbl> <int>     <dbl> <int>     <dbl> <int>
1     3          16.1      15      15.0    12      19.8     2      21.5     1
2     4          24.5      12      NA      NA      19.8     4      26.9     8
3     5          21.4       5      15.4     2      19.7     1      28.2     2

總結意味着使用dplyr

問題描述

1 個解決方案

解決方案1
1 已采納 2022-12-07 18:22:51

總結意味着使用dplyr

問題描述

1 個解決方案

解決方案1 1 已采納 2022-12-07 18:22:51

解決方案1
1 已采納 2022-12-07 18:22:51