[英]Aggregate by multiple columns and reshape from long to wide
There are some questions similar to this topic on SO but not exactly like my usecase.在 SO 上有一些与此主题类似的问题,但与我的用例并不完全相同。 I have a dataset where the columns are laid out as shown below
我有一个数据集,其中列的布局如下所示
Id Description Value
10 Cat 19
10 Cat 20
10 Cat 5
10 Cat 13
11 Cat 17
11 Cat 23
11 Cat 7
11 Cat 14
10 Dog 19
10 Dog 20
10 Dog 5
10 Dog 13
11 Dog 17
11 Dog 23
11 Dog 7
11 Dog 14
What I am trying to do is capture the mean of the Value column by Id, Description.我想要做的是通过 Id、Description 捕获 Value 列的平均值。 The final dataset would look like this.
最终的数据集看起来像这样。
Id Cat Dog
10 14.25 28.5
11 15.25 15.25
I can do this in a very rough manner not very efficient like this我可以以一种非常粗略的方式做到这一点,像这样效率不高
tempdf1 <- df %>%
filter(str_detect(Description, "Cat")) %>%
group_by(Id, Description) %>%
summarize(Mean_Value = mean(Value) , na.rm = TRUE))
This is not very convenient.这不是很方便。 Any advise on how how to accomplish the expected results more efficiently is much appreciated.
非常感谢有关如何更有效地实现预期结果的任何建议。
I would do this with tapply
:我会用
tapply
做到这一点:
with( dat, tapply(Value, list(Id,Description), mean))
Cat Dog
10 14.25 14.25
11 15.25 15.25
Does return a matrix object so don't try accessing with "$".确实返回一个矩阵对象,所以不要尝试使用“$”访问。
You can aggregate (calculate average) per groups using data.table
and get wanted table format using dcast()
:您可以使用
data.table
聚合(计算平均值) data.table
并使用dcast()
获得想要的表格格式:
library(data.table)
foo <- setDT(d)[, mean(Value), .(Id, Description)]
# Id Description V1
# 1: 10 Cat 14.25
# 2: 11 Cat 15.25
# 3: 10 Dog 14.25
# 4: 11 Dog 15.25
dcast(foo, Id ~ Description, value.var = "V1")
# Id Cat Dog
# 1: 10 14.25 14.25
# 2: 11 15.25 15.25
Use dcast
or even acast
from reshape2()
package使用
dcast
reshape2()
包中的dcast
甚至acast
dcast(dat,Id~Description,mean)
Id Cat Dog
1 10 14.25 14.25
2 11 15.25 15.25
Base R
might be abit longer: Base R
可能会更长一点:
reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names = "Value",idvar = "Id",timevar = "Description")
Id Value.Cat Value.Dog
1 10 14.25 14.25
2 11 15.25 15.25
You can do the summarise
using dplyr
and the transformation from long to wide using tidyr::spread
:你可以做
summarise
使用dplyr
和长使用广转型tidyr::spread
:
library(dplyr)
library(tidyr)
df %>%
group_by(Id, Description) %>%
summarise(Mean = mean(Value)) %>%
spread(Description, Mean)
Id Cat Dog
* <int> <dbl> <dbl>
1 10 14.25 14.25
2 11 15.25 15.25
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.