简体   繁体   中英

different results using one core and multiple cores to modify data.table

I found something very confusing when I use multiple processing to modify values in R data.table.

I tried to modify value in place by using a function. It works well using one core, and the values in data.table were successfully changed. But when I used multiple cores, it failed to change the value in data.table.

That makes me very confused. Anyone know why?

library(data.table)
library(parallel)
aa <- as.data.table(iris)
aa[,tt:=0]
# modify aa$tt in place
main <- function(x){
  #set(aa,x,6L,5)
  aa[x,tt:=5]
  return(NULL)
}

# aa$tt changed
mclapply(1:nrow(aa), main, mc.cores = 1)

# aa$tt unchanged
mclapply(1:nrow(aa), main, mc.cores = 2)

Short answer: Parallel sub processes work on copies of aa .

Longer answer:

mclapply uses forked "sub" processes (= mainly copies* of the parent process) and therefore work on copied data ( aa in your case).

This means inplace changes of aa in a sub process do not modify aa in the main process.

See ?parallel::mclapply for details, eg. how to use the final result that is a return value (.).

*) In fact under Linux forking is implemented using copy-on-write memory pages to improve performance

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM