[英]How to rearrange output from dplyr
當我寫下面的代碼
ddply(milkers, .(dim_cat, lact_cat), function(x) mean(x$milkyield))
我得到以下 output
class 對牛奶產量的平均計算(1 對 2)是正確的。 我想最終得到一張更像下面這張的桌子。
實際上,我正在嘗試獲取每個時間段內的動物數量並計算它們的平均產奶量。 問題是它正在計算所有時間段的動物總數和所有時間段的平均產奶量。
我用來生成此數據的代碼如下。
heiferdat <- subset(milkers, lact_cat== 1)
cowdat <- subset(milkers, lact_cat== 2)
ddply(milkers, .(dim_cat), function(x) c(Heifers = sum(milkers$lact_cat==1), H_Milk= mean(heiferdat$milkyield), Cows = sum(milkers$lact_cat==2), C_Milk= mean(cowdat$milkyield)))
我曾預計,在此代碼中,.(dim_cat) 變量將應用於 function 以限制總和和均值函數僅包括正確時間段內的動物。
我正在尋找有關如何獲得 output 的建議,每個時間段有一行,每個 class lact_cat 的動物數量和每個 lact_cat 的平均產奶量
謝謝
以下是我正在使用的數據的子集。
dput(milkers[180:200, c(11, 25, 26)])
dput(heiferdat[1:20, c(11, 25, 26)])
dput(cowdat[1:20, c(11, 25, 26)])
> dput(milkers[180:200, c(11, 25, 26)])
structure(list(milkyield = structure(c(8.42, 38.32, 14.27, 7.68,
16.59, 17.19, 24.45, 33.47, 36.16, 25.88, 11.61, 18.96, 11.27,
33.6, 21.57, 20.87, 9.62, 7.93, 21.02, 17.75, 22.01), label = "Milk (L)", class = c("labelled",
"numeric")), dim_cat = structure(c(5L, 3L, 7L, 7L, 2L, 7L, 2L,
2L, 2L, 3L, 6L, 6L, 2L, 3L, 6L, 6L, 6L, 6L, 6L, 7L, 6L), .Label = c("<31",
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled",
"factor"), label = "Days in Milk"), lact_cat = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), row.names = 180:200, class = "data.frame")
> dput(heiferdat[1:20, c(11, 25, 26)])
structure(list(milkyield = structure(c(14.27, 17.19, 11.61, 18.96,
11.27, 21.57, 20.87, 9.62, 7.93, 21.02, 17.75, 22.01, 25.15,
11.75, 12.6, 15.62, 19.29, 8.85, 15.52, 11.62), label = "Milk (L)", class = c("labelled",
"numeric")), dim_cat = structure(c(7L, 7L, 6L, 6L, 2L, 6L, 6L,
6L, 6L, 6L, 7L, 6L, 6L, 6L, 6L, 7L, 6L, 6L, 6L, 6L), .Label = c("<31",
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled",
"factor"), label = "Days in Milk"), lact_cat = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), row.names = c(182L,
185L, 190L, 191L, 192L, 194L, 195L, 196L, 197L, 198L, 199L, 200L,
201L, 202L, 203L, 204L, 205L, 206L, 207L, 208L), class = "data.frame")
> dput(cowdat[1:20, c(11, 25, 26)])
structure(list(milkyield = structure(c(15.73, 14.56, 16.94, 16.25,
39.09, 9.79, 8.41, 3.05, 38.89, 11.7, 29.89, 19.73, 18.2, 20.63,
20.32, 52.99, 10.11, 8.08, 10.84, 33.75), label = "Milk (L)", class = c("labelled",
"numeric")), dim_cat = structure(c(3L, 6L, 6L, 2L, 3L, 7L, 6L,
7L, 3L, 7L, 3L, 6L, 3L, 6L, 2L, 2L, 7L, 6L, 7L, 7L), .Label = c("<31",
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled",
"factor"), label = "Days in Milk"), lact_cat = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
遵循@DanChaltiel 的使用 dplyr 的建議。 這是 dplyr 方法:
library(dplyr)
all_summary = milkers %>%
group_by(dim_cat, lact_cat) %>%
summarise(avg = mean(milkyield),
num = n())
此時,您已計算出所有摘要信息。 以下代碼只是格式化/演示。
heifer_summary = all_summary %>%
filter(lact_cat == 1) %>%
select(dim_cat, Heifers = num, H_Milk = avg)
cow_summary = all_summary %>%
filter(lact_cat == 2) %>%
select(dim_cat, Cows = num, C_Milk = avg)
arranged_summary = full_join(heifer_summary, cow_summary, by = "dim_cat") %>%
select(dim_cat, Heifers, H_Milk, Cows, C_Milk) %>%
arrange(dim_cat)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.