[英]Aggregate data using each() with reshape2::dcast
I'm usually using reshape
package to aggregate some data (d'uh), usually with plyr
, because of its uber-awesome function each
. 我通常使用reshape
包来聚合一些数据(呃),通常用plyr
,因为each
超级棒的功能。 Recently, I received a suggestion to switch to reshape2
and try it out, and now I can't seem to use each
wizardry anymore. 最近,我收到了一个建议,切换到reshape2
并尝试一下,现在我似乎无法再使用each
魔法。
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> cast(m, am + vs ~ variable, each(min, max, mean, sd))
am vs hp_min hp_max hp_mean hp_sd
1 0 0 150 245 194.16667 33.35984
2 0 1 62 123 102.14286 20.93186
3 1 0 91 335 180.83333 98.81582
4 1 1 52 113 80.57143 24.14441
require(plyr)
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> dcast(m, am + vs ~ variable, each(min, max, mean, sd))
Error in structure(ordered, dim = ns) :
dims [product 4] do not match the length of object [16]
In addition: Warning messages:
1: In fs[[i]](x, ...) : no non-missing arguments to min; returning Inf
2: In fs[[i]](x, ...) : no non-missing arguments to max; returning -Inf
I wasn't into mood to comb this down, as my previous code works like a charm with reshape
, but I'd really like to know: 我没有心情去梳理它,因为我之前的代码就像一个reshape
的魅力,但我真的很想知道:
each
with dcast
? 是否有可能使用each
有dcast
? reshape2
at all? 是否建议使用reshape2
? is reshape
deprecated? reshape
已弃用? The answer to your first question appears to be no . 你的第一个问题的答案似乎是否定的 。 Quoting from ?reshape2:::dcast
: 引自?reshape2:::dcast
:
If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. 如果您提供的变量组合不能唯一标识原始数据集中的一行,则需要提供聚合函数fun.aggregate。 This function should take a vector of numbers and return a single summary statistic. 此函数应采用数字向量并返回单个摘要统计信息。
A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr , presumably with something like: 看看Hadley的reshape2的github页面表明他知道这个功能被删除了,但似乎认为在plyr中做得更好 ,大概是这样的:
ddply(m,.(am,vs),summarise,min = min(value),
max = max(value),
mean = mean(value),
sd = sd(value))
or if you really want to keep using each
: 或者如果你真的想继续使用each
:
ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.