使用parLapply将数据帧写入Oracle数据库时出现JVM错误

Question

I want to parallelize my data writing process. 我想并行化数据写入过程。 I am writing a data frame to Oracle Database. 我正在向Oracle数据库编写数据框架。 This data has 4 million rows and 8 columns. 该数据具有400万行和8列。 It takes 6.5 hours without parallelizing. 不进行并行化需要6.5个小时。

When I try to go parallel, I get the error 当我尝试平行运行时，出现错误

Error in checkForRemoteErrors(val) : 
  7 nodes produced errors; first error: No running JVM detected. Maybe .jinit() would help.

I know this error. 我知道这个错误。 I can solve it when I work with single cluster. 当我使用单个群集时，我可以解决它。 But I do not know how to tell other clusters the location of Java. 但是我不知道如何告诉其他集群Java的位置。 Here is my code 这是我的代码

Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181') 
library(rJava)
library(RJDBC)
library(DBI)
library(compiler)
library(dplyr)
library(data.table)

jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="C:/Program Files/directory/ojdbc6.jar", identifier.quote = "\"") 
jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:@//XXXXX", "YYYYY", "ZZZZZ")

By using Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181') I solve the same problem for single core. 通过使用Sys.setenv(JAVA_HOME='C:/Program Files/Java/jre1.8.0_181')我为单核解决了相同的问题。 But when I go parallel 但是当我平行

library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterExport(cl, varlist = list("jdbcConnection", "brand3.merge.u"))
clusterEvalQ(cl, .libPaths("C:/Users/onur.boyar/Documents/R/win-library/3.5"))
clusterEvalQ(cl, library(RJDBC))
clusterEvalQ(cl, library(rJava))

parLapply(cl, 1:length(brand3.merge.u$CELL_PH_NUM), function(x) dbSendUpdate(jdbcConnection, "INSERT INTO xxnvdw.an_cust_analytics  VALUES(?,?,?,?,?,?,?,?)", brand3.merge.u[x, 1], brand3.merge.u[x,2], brand3.merge.u[x,3],brand3.merge.u[x,4],brand3.merge.u[x,5],brand3.merge.u[x,6],brand3.merge.u[x,7],brand3.merge.u[x,8]))

#brand3.merge.u is my data frame that I try to write.

I get the above error and I do not know how to set my Java location for other nodes. 我收到上述错误，并且我不知道如何为其他节点设置Java位置。

I want to use parLapply since it is faster than foreach. 我想使用parLapply，因为它比foreach更快。 Any help would be appreciated. 任何帮助，将不胜感激。 Thanks! 谢谢！

Answer 1

JAVA_HOME environment variable JAVA_HOME环境变量

If the problem really is with the location of Java, you could set the environment variable in your .Renviron file. 如果问题确实出在Java的位置，则可以在.Renviron文件中设置环境变量。 It is likely located in ~/.Renviron . 它可能位于~/.Renviron 。 Add a line to that file and this will be propagated to all R session that run via your user: 在该文件中添加一行，它将被传播到通过您的用户运行的所有R会话中：

JAVA_HOME='C:/Program Files/Java/jre1.8.0_181'

Alternatively, you can just add that location to your PATH environment variable. 或者，您可以将该位置添加到PATH环境变量中。

JVM Initialization via rJava 通过rJava进行JVM初始化

On the other hand the error message may point to just a JVM not being initialized, which you can solve with .jinit , a minimal example: 另一方面，错误消息可能指向未初始化的JVM，您可以使用.jinit来解决，这是一个最小的示例：

library(parallel)
cl <- makeCluster(detectCores())
parallel::parLapply(cl, 1:5, function(x) {
  rJava::.jinit()
  rJava::.jnew(class = "java/lang/Integer", x)$toString()
})

Working around Java use 解决Java使用问题

This was not specifically asked, but you can also work around the need for Java dependency using ODBC drivers, which for Oracle should be accessible here : 并没有特别要求，但是您也可以使用ODBC驱动程序来解决对Java依赖的需求，对于Oracle，可以在这里进行访问：

con <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "[your driver's name]",
  ...
)

使用parLapply将数据帧写入Oracle数据库时出现JVM错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-01-15 07:32:10

使用parLapply将数据帧写入Oracle数据库时出现JVM错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-01-15 07:32:10

解决方案1
1 已采纳 2019-01-15 07:32:10