[英]Problem to reproduce results from parallelSVM in R
I am not able to set a seed value to get reproducible results from parallelSVM()
.我无法设置种子值以从
parallelSVM()
获得可重现的结果。
library(e1071)
library(parallelSVM)
data(iris)
x <- subset(iris, select = -Species)
y <- iris$Species
set.seed(1)
model <- parallelSVM(x, y)
parallelPredictions <- predict(model, x)
set.seed(1)
model2 <- parallelSVM(x, y)
parallelPredictions2 <- predict(model2, x)
all.equal(parallelPredictions,parallelPredictions2)
I know that this is not the right way to set a seed value for multicore operations, but I have no clue what to do alternatively.我知道这不是为多核操作设置种子值的正确方法,但我不知道该怎么做。
I know there is an option, when using mclapply
, but that does not help in my situation.我知道在使用
mclapply
时有一个选项,但这对我的情况没有帮助。
Edit:编辑:
I have found a solution by changing the function trainSample()
within the parallelSVM
with a trace
and the doRNG
package for seeds with the foreach
loop.我找到了一个解决方案,方法是通过
trace
更改parallelSVM
中的 function trainSample()
并使用foreach
循环更改种子的doRNG
package。
Does anybody know a better solution?有人知道更好的解决方案吗?
In short, there is no implemented method in parallelSVM
to handle this issue.简而言之,
parallelSVM
中没有实现的方法来处理这个问题。 However the package uses the foreach
and doParallel
packages to handle it's parallel operations.然而 package 使用
foreach
和doParallel
包来处理它的并行操作。 And digging hard enough on stackoverflow a solution is possible!并且在stackoverflow上足够努力地挖掘解决方案是可能的!
Credits to this answer , on the usage of the doRNG
package, and this answer for giving me an idea for a simpler enclosed solution.归功于这个答案,关于
doRNG
package 的使用,这个答案让我对更简单的封闭解决方案有了一个想法。
In the parallelSVM
package the parallelization happens through the parallelSVM::registerCores
functions.在
parallelSVM
package 中,并行化通过parallelSVM::registerCores
函数发生。 This function simply calls doParallel::registerDoParallel
with the number of cores, and no further arguments.这个 function 只是简单地使用核心数量调用
doParallel::registerDoParallel
,而不是进一步的 arguments。 My idea is simply to change the parallelSVM::registerCores
function, such that it automatically sets the seed at after creating a new cluster.我的想法是简单地更改
parallelSVM::registerCores
function,以便在创建新集群后自动将种子设置为。
When performing parallel computation, in which you need a parallel seed, there are 2 things you need to ensure在执行需要并行种子的并行计算时,需要确保两件事
Luckily the doRNG
package handles the first and uses a seed that which is alright on 2. Using a combination of unlockNamespace
and assign
we can overwrite the parallelSVM::registerCores
, such that it includes a call to doRNG::registerDoRNG
with the appropriate seed (function at the end of answer).幸运的是
doRNG
package 处理第一个并使用在 2 上没问题的种子。使用unlockNamespace
和assign
的组合,我们可以覆盖parallelSVM::registerCores
,这样它就可以使用适当的种子调用doRNG::registerDoRNG
(答案末尾的函数)。 Doing this we can actually get proper reproducibility as illstrated below:这样做我们实际上可以获得适当的再现性,如下所示:
library(parallelSVM)
library(e1071)
data(magicData)
set.seed.parallelSWM(1) #<=== set seed as we would normally.
#Example from help(parallelSVM)
system.time(parallelSvm1 <- parallelSVM(V11 ~ ., data = trainData[,-1],
numberCores = 4, samplingSize = 0.2,
probability = TRUE, gamma=0.1, cost = 10))
system.time(parallelSvm2 <- parallelSVM(V11 ~ ., data = trainData[,-1],
numberCores = 4, samplingSize = 0.2,
probability = TRUE, gamma=0.1, cost = 10))
pred1 <- predict(parallelSvm1)
pred2 <- predict(parallelSvm2)
all.equal(pred1, pred2)
[1] TRUE
identical(parallelSvm1, parallelSvm2)
[1] FALSE
Note that identical
does not have the power to properly asses the objects output by parallel::parallelSvm
, and thus the predictions are better to check whether the models are identical.请注意,
identical
没有能力通过parallel::parallelSvm
正确评估对象 output,因此预测更好地检查模型是否相同。
For safety lets check if this is also the case for the reproducible example in the question为了安全起见,让我们检查问题中的可重复示例是否也是这种情况
x <- subset(iris, select = -Species)
y <- iris$Species
set.seed.parallelSWM(1) #<=== set seed as we would normally (not necessary if above example has been run).
model <- parallelSVM(x, y)
model2 <- parallelSVM(x, y)
parallelPredicitions <- predict(model, x)
parallelPredicitions2 <- predict(model2, x)
all.equal(parallelPredicitions, parallelPredicitions2)
[1] TRUE
Phew..呸..
Last, if we are done, or if we wanted random seeds once again, we can reset the seed by executing最后,如果我们完成了,或者如果我们再次想要随机种子,我们可以通过执行重置种子
set.seed.parallelSWM() #<=== set seed to random each execution (standard).
#check:
model <- parallelSVM(x, y)
model2 <- parallelSVM(x, y)
parallelPredicitions <- predict(model, x)
parallelPredicitions2 <- predict(model2, x)
all.equal(parallelPredicitions, parallelPredicitions2)
[1] "3 string mismatches"
(the output will vary, as the RNNG seed is not set) (output 会有所不同,因为未设置 RNNG 种子)
credits to this answer .归功于这个答案。 Note that we might not have to double up on the assignment, but here i simply replicated the answer without checking if the code could be further reduced.
请注意,我们可能不必加倍分配,但在这里我只是简单地复制了答案,而不检查代码是否可以进一步减少。
set.seed.parallelSWM <- function(seed, once = TRUE){
if(missing(seed) || is.character(seed)){
out <- function (numberCores)
{
cluster <- parallel::makeCluster(numberCores)
doParallel::registerDoParallel(cluster)
}
}else{
require("doRNG", quietly = TRUE, character.only = TRUE)
out <- function(numberCores){
cluster <- parallel::makeCluster(numberCores)
doParallel::registerDoParallel(cluster)
doRNG::registerDoRNG(seed = seed, once = once)
}
}
unlockBinding("registerCores", as.environment("package:parallelSVM"))
assign("registerCores", out, "package:parallelSVM")
lockBinding("registerCores", as.environment("package:parallelSVM"))
unlockBinding("registerCores", getNamespace("parallelSVM"))
assign("registerCores", out, getNamespace("parallelSVM"))
lockBinding("registerCores", getNamespace("parallelSVM"))
#unlockBinding("registerCores", as.environment("package:parallelSVM"))
invisible()
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.