I found something very confusing when I use multiple processing to modify values in R data.table.
I tried to modify value in place by using a function. It works well using one core, and the values in data.table were successfully changed. But when I used multiple cores, it failed to change the value in data.table.
That makes me very confused. Anyone know why?
library(data.table)
library(parallel)
aa <- as.data.table(iris)
aa[,tt:=0]
# modify aa$tt in place
main <- function(x){
#set(aa,x,6L,5)
aa[x,tt:=5]
return(NULL)
}
# aa$tt changed
mclapply(1:nrow(aa), main, mc.cores = 1)
# aa$tt unchanged
mclapply(1:nrow(aa), main, mc.cores = 2)
Short answer: Parallel sub processes work on copies of aa
.
Longer answer:
mclapply
uses forked "sub" processes (= mainly copies* of the parent process) and therefore work on copied data ( aa
in your case).
This means inplace changes of aa
in a sub process do not modify aa
in the main process.
See ?parallel::mclapply
for details, eg. how to use the final result that is a return value (.).
*) In fact under Linux forking is implemented using copy-on-write memory pages to improve performance
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.