在doMC的foreach和dopar中调用其他软件包的注意事项是什么？

Question

此代码按预期工作：

library(dplyr)
data <- list(t1 = "hello world.", t2 = "bye world")

library(doMC)
registerDoMC(3)

res <- foreach(t = data) %dopar% {

    print(sprintf("processing %s", t))

    data.frame(text = t) %>%
    dplyr::count(text)

}

print(res)

但是，此代码仅显示“处理问候世界”。 和“处理再见的世界”，然后挂起（不引发任何异常）。

library(dplyr)
coreNLP::initCoreNLP()

data <- list(t1 = "hello world.", t2 = "bye world")

library(doMC)
registerDoMC(3)

res <- foreach(t = data) %dopar% {

    print(sprintf("processing %s", t))

    coreNLP::annotateString(t)$token

}

print(res)

如果我将%dopar%更改为%do%则上面的代码将按预期工作。

我不明白是什么原因导致了这种现象。 为什么在%dopar%中调用coreNLP函数会导致R挂起，但可以与其他软件包一起正常工作？ 这与coreNLP对Java的依赖有关吗？

这是sessionInfo()的输出：

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0

Answer 1

您的第一个示例对类似设置的我来说效果很好。 运行示例后，我的会话信息如下； 确保使用新的R会话（ R --vanilla ）再试一次。 我有四个核心（来自parallel::detectCores() ）。

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] doMC_1.3.4      iterators_1.0.8 foreach_1.4.3   dplyr_0.5.0    

loaded via a namespace (and not attached):
[1] compiler_3.4.0   magrittr_1.5     R6_2.2.0         assertthat_0.2.0
[5] DBI_0.6-1        tibble_1.3.0     Rcpp_0.12.10     codetools_0.2-15

你的第二个例子不为我工作的。 输出如下。 我的猜测是，分叉的进程无法共享coreNLP所依赖的相同的底层Java进程/服务； 不太了解coreNLP。

> res <- foreach(t = data) %dopar% {
+ 
+     print(sprintf("processing %s", t))
+ 
+     coreNLP::annotateString(t)$token
+ 
+ }
[1] "processing hello world."
[1] "processing bye world"


^CError in selectChildren(ac, 1) : 
  Java called System.exit(130) requesting R to quit - trying to recover
Error during wrapup: C stack usage  591577121812 is too close to the limit

 *** caught segfault ***
address 0x2, cause 'memory not mapped'

在doMC的foreach和dopar中调用其他软件包的注意事项是什么？

问题描述

1 个解决方案

解决方案1
1 2017-05-04 04:56:15

在doMC的foreach和dopar中调用其他软件包的注意事项是什么？

问题描述

1 个解决方案

解决方案1 1 2017-05-04 04:56:15

解决方案1
1 2017-05-04 04:56:15