简体   繁体   中英

Parallel version of transform (or mutate) in R?

I have a slow function that I want to apply to each row in a data.frame. The computation is embarrassingly parallel.

I have 4 cores, but R's built in functions only uses one.

All I want to do is a parallel equivalent to:

data$c = slow.foo(data$a, data$b)

I can't find clear instructions on which library to use (overwhelmed by choice) and how to use it. Any help would be greatly appreciated.

The parallel package is included with base R. Here's a quick example using parApply from that package:

library(parallel)

# Some dummy data
d <- data.frame(x1=runif(1000), x2=runif(1000))

# Create a cluster with 1 fewer cores than are available. Adjust as necessary
cl <- makeCluster(detectCores() - 1)

# Just like regular apply, but rows get sent to the various processes
out <- parApply(cl, d, 1, function(x) x[1] - x[2])

stopCluster(cl)

# Same as x1 - x2?
identical(out, d$x1 - d$x2)

# [1] TRUE

You also have, eg, parSapply and parLapply at your disposal.

Of course, for the example I've given, the vectorised operation d$x1 - d$x2 is much faster. Think about whether your processes can be vectorised rather than performed row by row.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM