簡體   English   中英

如何從 dplyr 重新排列 output

[英]How to rearrange output from dplyr

當我寫下面的代碼

   ddply(milkers, .(dim_cat, lact_cat), function(x) mean(x$milkyield))

我得到以下 output

每個時間段 2 行的初步輸出

class 對牛奶產量的平均計算(1 對 2)是正確的。 我想最終得到一張更像下面這張的桌子。

每個昏暗時間段一行的所需表格格式

實際上,我正在嘗試獲取每個時間段內的動物數量並計算它們的平均產奶量。 問題是它正在計算所有時間段的動物總數和所有時間段的平均產奶量。

我用來生成此數據的代碼如下。

heiferdat <- subset(milkers, lact_cat== 1)
cowdat <- subset(milkers, lact_cat== 2)


ddply(milkers, .(dim_cat), function(x) c(Heifers = sum(milkers$lact_cat==1), H_Milk= mean(heiferdat$milkyield), Cows = sum(milkers$lact_cat==2), C_Milk= mean(cowdat$milkyield)))

我曾預計,在此代碼中,.(dim_cat) 變量將應用於 function 以限制總和和均值函數僅包括正確時間段內的動物。

我正在尋找有關如何獲得 output 的建議,每個時間段有一行,每個 class lact_cat 的動物數量和每個 lact_cat 的平均產奶量

謝謝

以下是我正在使用的數據的子集。

dput(milkers[180:200, c(11, 25, 26)]) 
dput(heiferdat[1:20, c(11, 25, 26)])
dput(cowdat[1:20, c(11, 25, 26)])

> dput(milkers[180:200, c(11, 25, 26)]) 
structure(list(milkyield = structure(c(8.42, 38.32, 14.27, 7.68, 
16.59, 17.19, 24.45, 33.47, 36.16, 25.88, 11.61, 18.96, 11.27, 
33.6, 21.57, 20.87, 9.62, 7.93, 21.02, 17.75, 22.01), label = "Milk (L)", class = c("labelled", 
"numeric")), dim_cat = structure(c(5L, 3L, 7L, 7L, 2L, 7L, 2L, 
2L, 2L, 3L, 6L, 6L, 2L, 3L, 6L, 6L, 6L, 6L, 6L, 7L, 6L), .Label = c("<31", 
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled", 
"factor"), label = "Days in Milk"), lact_cat = structure(c(2L, 
2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), row.names = 180:200, class = "data.frame")

> dput(heiferdat[1:20, c(11, 25, 26)]) 
structure(list(milkyield = structure(c(14.27, 17.19, 11.61, 18.96, 
11.27, 21.57, 20.87, 9.62, 7.93, 21.02, 17.75, 22.01, 25.15, 
11.75, 12.6, 15.62, 19.29, 8.85, 15.52, 11.62), label = "Milk (L)", class = c("labelled", 
"numeric")), dim_cat = structure(c(7L, 7L, 6L, 6L, 2L, 6L, 6L, 
6L, 6L, 6L, 7L, 6L, 6L, 6L, 6L, 7L, 6L, 6L, 6L, 6L), .Label = c("<31", 
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled", 
"factor"), label = "Days in Milk"), lact_cat = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), row.names = c(182L, 
185L, 190L, 191L, 192L, 194L, 195L, 196L, 197L, 198L, 199L, 200L, 
201L, 202L, 203L, 204L, 205L, 206L, 207L, 208L), class = "data.frame")

> dput(cowdat[1:20, c(11, 25, 26)]) 
structure(list(milkyield = structure(c(15.73, 14.56, 16.94, 16.25, 
39.09, 9.79, 8.41, 3.05, 38.89, 11.7, 29.89, 19.73, 18.2, 20.63, 
20.32, 52.99, 10.11, 8.08, 10.84, 33.75), label = "Milk (L)", class = c("labelled", 
"numeric")), dim_cat = structure(c(3L, 6L, 6L, 2L, 3L, 7L, 6L, 
7L, 3L, 7L, 3L, 6L, 3L, 6L, 2L, 2L, 7L, 6L, 7L, 7L), .Label = c("<31", 
"31-90", "91-150", "151-210", "211-270", "271-330", ">330"), class = c("labelled", 
"factor"), label = "Days in Milk"), lact_cat = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA, 
20L), class = "data.frame")

遵循@DanChaltiel 的使用 dplyr 的建議。 這是 dplyr 方法:

library(dplyr)

all_summary = milkers %>%
  group_by(dim_cat, lact_cat) %>%
  summarise(avg = mean(milkyield),
            num = n())

此時,您已計算出所有摘要信息。 以下代碼只是格式化/演示。

heifer_summary = all_summary %>%
  filter(lact_cat == 1) %>%
  select(dim_cat, Heifers = num, H_Milk = avg)
cow_summary = all_summary %>%
  filter(lact_cat == 2) %>%
  select(dim_cat, Cows = num, C_Milk = avg)

arranged_summary = full_join(heifer_summary, cow_summary, by = "dim_cat") %>%
  select(dim_cat, Heifers, H_Milk, Cows, C_Milk) %>%
  arrange(dim_cat)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM