简体   繁体   English

按多列聚合并从长到宽重塑

[英]Aggregate by multiple columns and reshape from long to wide

There are some questions similar to this topic on SO but not exactly like my usecase.在 SO 上有一些与此主题类似的问题,但与我的用例并不完全相同。 I have a dataset where the columns are laid out as shown below我有一个数据集,其中列的布局如下所示

     Id        Description          Value
     10        Cat                  19
     10        Cat                  20
     10        Cat                  5
     10        Cat                  13
     11        Cat                  17
     11        Cat                  23
     11        Cat                  7
     11        Cat                  14  
     10        Dog                  19
     10        Dog                  20
     10        Dog                  5
     10        Dog                  13
     11        Dog                  17
     11        Dog                  23
     11        Dog                  7
     11        Dog                  14    

What I am trying to do is capture the mean of the Value column by Id, Description.我想要做的是通过 Id、Description 捕获 Value 列的平均值。 The final dataset would look like this.最终的数据集看起来像这样。

     Id       Cat         Dog 
     10       14.25       28.5
     11       15.25       15.25

I can do this in a very rough manner not very efficient like this我可以以一种非常粗略的方式做到这一点,像这样效率不高

tempdf1 <- df %>%
  filter(str_detect(Description, "Cat")) %>%
   group_by(Id, Description) %>%
  summarize(Mean_Value = mean(Value) , na.rm = TRUE))

This is not very convenient.这不是很方便。 Any advise on how how to accomplish the expected results more efficiently is much appreciated.非常感谢有关如何更有效地实现预期结果的任何建议。

I would do this with tapply :我会用tapply做到这一点:

with( dat, tapply(Value, list(Id,Description), mean))
     Cat   Dog
10 14.25 14.25
11 15.25 15.25

Does return a matrix object so don't try accessing with "$".确实返回一个矩阵对象,所以不要尝试使用“$”访问。

You can aggregate (calculate average) per groups using data.table and get wanted table format using dcast() :您可以使用data.table聚合(计算平均值) data.table并使用dcast()获得想要的表格格式:

library(data.table)
foo <- setDT(d)[, mean(Value), .(Id, Description)]
#    Id Description    V1
# 1: 10         Cat 14.25
# 2: 11         Cat 15.25
# 3: 10         Dog 14.25
# 4: 11         Dog 15.25
dcast(foo, Id ~ Description, value.var = "V1")
#    Id   Cat   Dog
# 1: 10 14.25 14.25
# 2: 11 15.25 15.25

Use dcast or even acast from reshape2() package使用dcast reshape2()包中的dcast甚至acast

dcast(dat,Id~Description,mean)
   Id   Cat   Dog
 1 10 14.25 14.25
 2 11 15.25 15.25

Base R might be abit longer: Base R可能会更长一点:

 reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names  = "Value",idvar = "Id",timevar = "Description")
  Id Value.Cat Value.Dog
1 10     14.25     14.25
2 11     15.25     15.25

You can do the summarise using dplyr and the transformation from long to wide using tidyr::spread :你可以做summarise使用dplyr和长使用广转型tidyr::spread

library(dplyr)
library(tidyr)

df %>%
    group_by(Id, Description) %>%
    summarise(Mean = mean(Value)) %>% 
    spread(Description, Mean)

     Id   Cat   Dog
* <int> <dbl> <dbl>
1    10 14.25 14.25
2    11 15.25 15.25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM