简体   繁体   中英

R: A faster alternative to scaleBy

I am using scaleBy from the doBy R package, to standardize a variable by a condition for each subject in my dataset. I have about 5137 participants in my data set, each with about 120 observations. On that dataset, scaleBy is very slow (close to 15 minutes). Other functions (eg, summaryBy, melt, dcast) work much faster (no more than 20 seconds). I wonder whether there are faster simple alternatives for the scaleBy.

Here is a simulation code that you can run to mimic my dataset, in terms of number of participants, number of conditions within each participant (it is a repeated measures design), and number of observations for each condition for each participant:

 nSubj <- 5137 valuesPerSubj <- 120 nobs <- nSubj*valuesPerSubj ttt <- data.frame(cond=rep(c('a','b','c','d'),nobs/4), rt=rnorm(nobs,mean=700,sd=150), subj=rep(seq(1:nSubj),valuesPerSubj)) start <- Sys.time() zt <- scaleBy(rt ~ subj+cond, data=ttt) end <- Sys.time() duration <- end-start duration 

The scaleBy in this code takes my computer 11.7 minutes (you can reduce nSubj in the code above for faster testing). Any faster solutions?

I found a much faster code. I replaced the scaleBy line with these two lines:

 gttt <- group_by(ttt,subj,cond) zt <- mutate(gttt,zrt=as.numeric(scale(rt))) 

This code took less than 4 seconds to run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM