different results using one core and multiple cores to modify data.table

Question

I found something very confusing when I use multiple processing to modify values in R data.table.

I tried to modify value in place by using a function. It works well using one core, and the values in data.table were successfully changed. But when I used multiple cores, it failed to change the value in data.table.

That makes me very confused. Anyone know why?

library(data.table)
library(parallel)
aa <- as.data.table(iris)
aa[,tt:=0]
# modify aa$tt in place
main <- function(x){
  #set(aa,x,6L,5)
  aa[x,tt:=5]
  return(NULL)
}

# aa$tt changed
mclapply(1:nrow(aa), main, mc.cores = 1)

# aa$tt unchanged
mclapply(1:nrow(aa), main, mc.cores = 2)

Answer 1

Short answer: Parallel sub processes work on copies of aa .

Longer answer:

mclapply uses forked "sub" processes (= mainly copies* of the parent process) and therefore work on copied data ( aa in your case).

This means inplace changes of aa in a sub process do not modify aa in the main process.

See ?parallel::mclapply for details, eg. how to use the final result that is a return value (.).

*) In fact under Linux forking is implemented using copy-on-write memory pages to improve performance

different results using one core and multiple cores to modify data.table

Question

1 answers

solution1
0 2019-09-27 06:57:29

different results using one core and multiple cores to modify data.table

Question

1 answers

solution1 0 2019-09-27 06:57:29

solution1
0 2019-09-27 06:57:29