简体   繁体   中英

What is the fastest way to calculate a lot of means and sds in R?

I'm relatively new to R and am working on a project where I need to calculate a LOT of column means and standard deviations. I have a dataset called scores that has over 3 million observations of 172 variables. I need to transform each of these scores by subtracting a mean and dividing a standard deviation. I am able to do what I want with my code below, but it takes up all of the memory in my R session (which is 50GB.). This step (calculating means and sds and transforming values) is the most memory-expensive step in my code and I am wondering if there is anything I can do lessen it? Would a function help? Should I store my data differently? Or does it take the same amount of power to do the math regardless of how you ask?

I am trying to avoid paying for a remote machine with more power if possible.

correct_scores <- TRUE

if (correct_scores){
  # pull score data from larger database
  scores <- noise[["i_scores"]][["whole_dataset"]][,-c(1:4)]
  # calculate means and sds
  meanofmeans <- mean(apply(scores, 2, mean))
  meanofsds <- mean(apply(scores, 2, sd))
  # do the thing
  scores <- (scores - meanofmeans) / meanofsds
  # put values back into larger database
  noise[[ "i_scores-cor" ]][["whole_dataset"]] <- cbind(noise[["i_scores"]][["whole_dataset"]][,c(1:4)],scores)
}

a tiny bit of reproducible code from the scores dataset:

scores <- data.frame(ENCFF802ZBQ = c(34.80, -0.01, 0.248, 0.54),
                     ENCFF477IRE = c(0.32, 0.24, -0.24, 23.01),
                     ENCFF127IJN = c(0.23, 0.56, 0.01, 0.01))

Thanks!!

Given your example:

library(data.table)           
setDT(scores)[, lapply(.SD, scale)]

setDT(scores) converts scores to a data.table . lapply(.SD, scale) applies the scale(...) function to each column in scores ( .SD is a shorthand in data.table for "subset of columns"). In this case the subset is all columns. See ?data.table for more information.

To your question: Should I store my data differently? Yes absolutely. But I'd need to see the structure of noise and perhaps how/why you import it that way to comment further.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM