Subtracting groupwise means from columns using either plyr or matrix algebra

Question

I'm trying to write some parallelizable code (exploting plyr and doMC ) to calculate and subtract groupwise means from columns of a data frame. I'm having a hard time getting the plyr syntax correct.

Here is the script with a working for-loop:

data = data.frame(x = rnorm(100),y = rnorm(100),ID = round(runif(100)*10))
data = data[with(data,order(ID)),]
dm = matrix(rep(NA,nrow(data)*(ncol(data)-1)),nrow(data),(ncol(data)-1))

for (i in 1:(ncol(data)-1)){
    m = summaryBy(data[,i]~ID,data=data,fun=mean)
    d = data.frame(data[,i],ID=data$ID)
    a = merge(d,m,by="ID")
    dm[,i] = a[,2]-a[,3]
    }

But I try to break it by the column names of data using ddply, and it gives me an error message. Here is my non-working code:

dmf = function(i){
    m = summaryBy(data[,i]~ID,data=data,fun=mean)
    d = data.frame(data[,i],ID=data$ID)
    a = merge(d,m,by="ID")
    dm = a[,2]-a[,3]
    as.data.frame(dm)
    }

dm = ddply(.data=data,.fun = dmf,.variables = colnames(data))

>Error in .subset(x, j) : invalid subscript type 'list'

Anybody have a solution for this?

Alternatively, if this is doable with matrices, I'd greatly appreciate that sort of solution from someone with better matrix intuition than me.

Answer 1

To take full advantage of plyr , I would combine colwise and the base function scale . Also, if needed, let ddply handle the parallelization at the highest level:

dm <- ddply(data, "ID", colwise(scale, center = TRUE, scale = FALSE),
            .parallel = TRUE)

Subtracting groupwise means from columns using either plyr or matrix algebra

Question

1 answers

solution1
4 ACCPTED 2013-08-08 11:01:58

Subtracting groupwise means from columns using either plyr or matrix algebra

Question

1 answers

solution1 4 ACCPTED 2013-08-08 11:01:58

solution1
4 ACCPTED 2013-08-08 11:01:58