I'm trying to write some parallelizable code (exploting plyr
and doMC
) to calculate and subtract groupwise means from columns of a data frame. I'm having a hard time getting the plyr
syntax correct.
Here is the script with a working for-loop:
data = data.frame(x = rnorm(100),y = rnorm(100),ID = round(runif(100)*10))
data = data[with(data,order(ID)),]
dm = matrix(rep(NA,nrow(data)*(ncol(data)-1)),nrow(data),(ncol(data)-1))
for (i in 1:(ncol(data)-1)){
m = summaryBy(data[,i]~ID,data=data,fun=mean)
d = data.frame(data[,i],ID=data$ID)
a = merge(d,m,by="ID")
dm[,i] = a[,2]-a[,3]
}
But I try to break it by the column names of data using ddply, and it gives me an error message. Here is my non-working code:
dmf = function(i){
m = summaryBy(data[,i]~ID,data=data,fun=mean)
d = data.frame(data[,i],ID=data$ID)
a = merge(d,m,by="ID")
dm = a[,2]-a[,3]
as.data.frame(dm)
}
dm = ddply(.data=data,.fun = dmf,.variables = colnames(data))
>Error in .subset(x, j) : invalid subscript type 'list'
Anybody have a solution for this?
Alternatively, if this is doable with matrices, I'd greatly appreciate that sort of solution from someone with better matrix intuition than me.
To take full advantage of plyr
, I would combine colwise
and the base function scale
. Also, if needed, let ddply
handle the parallelization at the highest level:
dm <- ddply(data, "ID", colwise(scale, center = TRUE, scale = FALSE),
.parallel = TRUE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.