按多列聚合並從長到寬重塑

Question

在 SO 上有一些與此主題類似的問題，但與我的用例並不完全相同。 我有一個數據集，其中列的布局如下所示

     Id        Description          Value
     10        Cat                  19
     10        Cat                  20
     10        Cat                  5
     10        Cat                  13
     11        Cat                  17
     11        Cat                  23
     11        Cat                  7
     11        Cat                  14  
     10        Dog                  19
     10        Dog                  20
     10        Dog                  5
     10        Dog                  13
     11        Dog                  17
     11        Dog                  23
     11        Dog                  7
     11        Dog                  14

我想要做的是通過 Id、Description 捕獲 Value 列的平均值。 最終的數據集看起來像這樣。

     Id       Cat         Dog 
     10       14.25       28.5
     11       15.25       15.25

我可以以一種非常粗略的方式做到這一點，像這樣效率不高

tempdf1 <- df %>%
  filter(str_detect(Description, "Cat")) %>%
   group_by(Id, Description) %>%
  summarize(Mean_Value = mean(Value) , na.rm = TRUE))

這不是很方便。 非常感謝有關如何更有效地實現預期結果的任何建議。

Answer 1

我會用tapply做到這一點：

with( dat, tapply(Value, list(Id,Description), mean))
     Cat   Dog
10 14.25 14.25
11 15.25 15.25

確實返回一個矩陣對象，所以不要嘗試使用“$”訪問。

Answer 2

您可以使用data.table聚合（計算平均值） data.table並使用dcast()獲得想要的表格格式：

library(data.table)
foo <- setDT(d)[, mean(Value), .(Id, Description)]
#    Id Description    V1
# 1: 10         Cat 14.25
# 2: 11         Cat 15.25
# 3: 10         Dog 14.25
# 4: 11         Dog 15.25
dcast(foo, Id ~ Description, value.var = "V1")
#    Id   Cat   Dog
# 1: 10 14.25 14.25
# 2: 11 15.25 15.25

Answer 3

使用dcast reshape2()包中的dcast甚至acast

dcast(dat,Id~Description,mean)
   Id   Cat   Dog
 1 10 14.25 14.25
 2 11 15.25 15.25

Base R可能會更長一點：

 reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names  = "Value",idvar = "Id",timevar = "Description")
  Id Value.Cat Value.Dog
1 10     14.25     14.25
2 11     15.25     15.25

Answer 4

你可以做summarise使用dplyr和長使用廣轉型tidyr::spread ：

library(dplyr)
library(tidyr)

df %>%
    group_by(Id, Description) %>%
    summarise(Mean = mean(Value)) %>% 
    spread(Description, Mean)

     Id   Cat   Dog
* <int> <dbl> <dbl>
1    10 14.25 14.25
2    11 15.25 15.25

按多列聚合並從長到寬重塑

問題描述

4 個解決方案

解決方案1
2 2017-12-22 21:09:08

解決方案2
1 2017-12-22 20:58:31

解決方案3
1 已采納 2017-12-22 21:02:50

解決方案4
1 2017-12-22 21:05:23

按多列聚合並從長到寬重塑

問題描述

4 個解決方案

解決方案1 2 2017-12-22 21:09:08

解決方案2 1 2017-12-22 20:58:31

解決方案3 1 已采納 2017-12-22 21:02:50

解決方案4 1 2017-12-22 21:05:23

解決方案1
2 2017-12-22 21:09:08

解決方案2
1 2017-12-22 20:58:31

解決方案3
1 已采納 2017-12-22 21:02:50

解決方案4
1 2017-12-22 21:05:23