简体   繁体   中英

No parallel processing happens with rxExec from RevoScaleR

Testing the parallel processing brought by the RevoScaleR package with SQL Server 2016 R Services (In-Database). Following the example provided by Microsoft here https://docs.microsoft.com/en-us/sql/advanced-analytics/tutorials/r-tutorial-custom-r-functions?view=sql-server-2016 . However, not like claimed in the doc, I didn't see parallelism happen. Anyone know why?

The SQL Server was installed on premise with 8 cores. The only extra settings made on top of the example are:

  • set elemType = 'cores' for rxExec.

  • set consoleOutput = TRUE for RxInSqlServer.

My testing script in T-SQL is:

  EXEC sp_execute_external_script @language = N'R',  
     @script = N'
       # set up the connection string
      sqlConnString <- "Driver=SQL Server;server=.; 
                              database=master; 
                              Trusted_Connection=True"

      sqlCompute <- RxInSqlServer(connectionString = sqlConnString, consoleOutput = TRUE, numTasks= 4)
        rxSetComputeContext(sqlCompute)

        rollDice <- function()
        {
          cat(paste0("R Process ID = ", Sys.getpid(), " started at ", Sys.time()))
          cat("\n")
          result <- NULL
          point <- NULL
          count <- 1
          while (is.null(result))
          {
            roll <- sum(sample(6, 2, replace=TRUE))

            if (is.null(point))
            { point <- roll }
            if (count == 1 && (roll == 7 || roll == 11))
            {  result <- "Win" }
            else if (count == 1 && (roll == 2 || roll == 3 || roll == 12))
            { result <- "Loss" }
            else if (count > 1 && roll == 7 )
            { result <- "Loss" }
            else if (count > 1 && point == roll)
            { result <- "Win" }
            else { count <- count + 1 }
          }
          cat(paste0("R Process ID = ", Sys.getpid(), "completed at ", Sys.time()))
          cat("\n")
          result
        }

        sqlServerExec <- rxExec(rollDice, timesToRun=8, elemType = "cores", RNGseed="auto")
        return(NULL)', 
  @parallel = 1

The 8 runs are clearly executed sequentially based on the console output:

STDOUT message(s) from external script: 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:10.60  ====== 
R Process ID = 7620 started at 2019-08-29 11:37:10.97 
R Process ID = 7620completed at 2019-08-29 11:37:11.03 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:11.08  ====== 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:12.27  ====== 
R Process ID = 9072 started at 2019-08-29 11:37:12.80 
R Process ID = 9072completed at 2019-08-29 11:37:12.84 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:12.88  ====== 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:14.29  ====== 
R Process ID = 8728 started at 2019-08-29 11:37:15.07 
R Process ID = 8728completed at 2019-08-29 11:37:15.10 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:15.15  ====== 
STDOUT message(s) from external script: 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:16.31  ====== 
R Process ID = 8444 started at 2019-08-29 11:37:16.87 
R Process ID = 8444completed at 2019-08-29 11:37:16.91 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:16.97  ====== 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:18.18  ====== 
R Process ID = 8244 started at 2019-08-29 11:37:18.72 
R Process ID = 8244completed at 2019-08-29 11:37:18.85 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:18.93  ====== 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:20.00  ====== 
R Process ID = 2332 started at 2019-08-29 11:37:20.54 
R Process ID = 2332completed at 2019-08-29 11:37:20.59 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:20.63  ====== 
STDOUT message(s) from external script: 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:21.62  ====== 
R Process ID = 336 started at 2019-08-29 11:37:22.24 
R Process ID = 336completed at 2019-08-29 11:37:22.27 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:22.32  ====== 
======  WIN-6L7QANR32DF  ( process  1 ) has started run at  2019-08-29 11:37:23.38  ====== 
R Process ID = 8280 started at 2019-08-29 11:37:23.88 
R Process ID = 8280completed at 2019-08-29 11:37:23.91 
======  WIN-6L7QANR32DF  ( process  1 ) has completed run at  2019-08-29 11:37:23.96  ====== 

The Microsoft's doc seems to be misleading. Changing the computation context to RxInSqlServer doesn't seem to parallel, instead using RxLocalParallel worked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM