简体   繁体   English

使用每个()与reshape2 :: dcast聚合数据

[英]Aggregate data using each() with reshape2::dcast

I'm usually using reshape package to aggregate some data (d'uh), usually with plyr , because of its uber-awesome function each . 我通常使用reshape包来聚合一些数据(呃),通常用plyr ,因为each超级棒的功能。 Recently, I received a suggestion to switch to reshape2 and try it out, and now I can't seem to use each wizardry anymore. 最近,我收到了一个建议,切换到reshape2并尝试一下,现在我似乎无法再使用each魔法。

reshape 重塑

> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> cast(m, am + vs ~ variable, each(min, max, mean, sd))
  am vs hp_min hp_max   hp_mean    hp_sd
1  0  0    150    245 194.16667 33.35984
2  0  1     62    123 102.14286 20.93186
3  1  0     91    335 180.83333 98.81582
4  1  1     52    113  80.57143 24.14441

reshape2 reshape2

require(plyr)
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> dcast(m, am + vs ~ variable, each(min, max, mean, sd))
Error in structure(ordered, dim = ns) : 
  dims [product 4] do not match the length of object [16]
In addition: Warning messages:
1: In fs[[i]](x, ...) : no non-missing arguments to min; returning Inf
2: In fs[[i]](x, ...) : no non-missing arguments to max; returning -Inf

I wasn't into mood to comb this down, as my previous code works like a charm with reshape , but I'd really like to know: 我没有心情去梳理它,因为我之前的代码就像一个reshape的魅力,但我真的很想知道:

  1. is it possible to use each with dcast ? 是否有可能使用eachdcast
  2. is it advisable to use reshape2 at all? 是否建议使用reshape2 is reshape deprecated? reshape已弃用?

The answer to your first question appears to be no . 你的第一个问题的答案似乎是否定的 Quoting from ?reshape2:::dcast : 引自?reshape2:::dcast

If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. 如果您提供的变量组合不能唯一标识原始数据集中的一行,则需要提供聚合函数fun.aggregate。 This function should take a vector of numbers and return a single summary statistic. 此函数应采用数字向量并返回单个摘要统计信息。

A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr , presumably with something like: 看看Hadley的reshape2的github页面表明他知道这个功能被删除了,但似乎认为在plyr做得更好 ,大概是这样的:

ddply(m,.(am,vs),summarise,min = min(value),
                           max = max(value),
                           mean = mean(value),
                           sd = sd(value))

or if you really want to keep using each : 或者如果你真的想继续使用each

ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM