简体   繁体   中英

Subtracting groupwise means from columns using either plyr or matrix algebra

I'm trying to write some parallelizable code (exploting plyr and doMC ) to calculate and subtract groupwise means from columns of a data frame. I'm having a hard time getting the plyr syntax correct.

Here is the script with a working for-loop:

data = data.frame(x = rnorm(100),y = rnorm(100),ID = round(runif(100)*10))
data = data[with(data,order(ID)),]
dm = matrix(rep(NA,nrow(data)*(ncol(data)-1)),nrow(data),(ncol(data)-1))

for (i in 1:(ncol(data)-1)){
    m = summaryBy(data[,i]~ID,data=data,fun=mean)
    d = data.frame(data[,i],ID=data$ID)
    a = merge(d,m,by="ID")
    dm[,i] = a[,2]-a[,3]
    }

But I try to break it by the column names of data using ddply, and it gives me an error message. Here is my non-working code:

dmf = function(i){
    m = summaryBy(data[,i]~ID,data=data,fun=mean)
    d = data.frame(data[,i],ID=data$ID)
    a = merge(d,m,by="ID")
    dm = a[,2]-a[,3]
    as.data.frame(dm)
    }

dm = ddply(.data=data,.fun = dmf,.variables = colnames(data))

>Error in .subset(x, j) : invalid subscript type 'list'

Anybody have a solution for this?

Alternatively, if this is doable with matrices, I'd greatly appreciate that sort of solution from someone with better matrix intuition than me.

To take full advantage of plyr , I would combine colwise and the base function scale . Also, if needed, let ddply handle the parallelization at the highest level:

dm <- ddply(data, "ID", colwise(scale, center = TRUE, scale = FALSE),
            .parallel = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM