简体   繁体   English

tryCatch - 命名空间?

[英]tryCatch - namespace?

I am quite new to R and I am confused by the correct usage of tryCatch . 我对R很新,我对tryCatch的正确用法感到困惑。 My goal is to make a prediction for a large data set. 我的目标是对大型数据集进行预测。 If the predictions cannot fit into memory, I want to circumvent the problem by splitting my data. 如果预测无法适应内存,我想通过拆分数据来规避问题。

Right now, my code looks roughly as follows: 现在,我的代码大致如下:

tryCatch({
  large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
  for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
    small_vector = predict(model, large_data_frame[i:(i+step-1), ])
    save(small_vector, tmpfile)
  }
  rm(large_data_frame) # free memory
  large_vector = NULL
  for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
    load(tmpfile)
    unlink(tmpfile)
    large_vector = c(large_vector, small_vector)
  }
})

The point is that if no error occurs, large_vector is filled with my predictions as expected. 关键是如果没有发生错误, large_vector会按预期填充我的预测。 If an error occurs, large_vector seems to exist only in the namespace of the error code - which makes sense because I declared it as a function. 如果发生错误, large_vector似乎只存在于错误代码的命名空间中 - 这是有道理的,因为我将其声明为函数。 For the same reason, I get a warning saying that large_data_frame cannot be removed. 出于同样的原因,我收到一条警告,说无法删除large_data_frame

Unfortunately, this behavior is not what I want. 不幸的是,这种行为不是我想要的。 I would want to assign the variable large_vector from within my error function. 我想从我的错误函数中分配变量large_vector I figured that one possibility is to specify the environment and use assign. 我认为一种可能性是指定环境并使用assign。 Thus, I would use the following statements in my error code: 因此,我会在我的错误代码中使用以下语句:

rm(large_data_frame, envir = parent.env(environment()))
[...]
assign('large_vector', large_vector, parent.env(environment()))

However, this solution seems rather dirty to me. 但是,这个解决方案对我来说似乎很脏。 I wonder whether there is any possibility to achieve my goal with "clean" code? 我想知道是否有可能用“干净”的代码实现我的目标?

[EDIT] There seems to be some confusion because I put the code above mainly to illustrate the problem, not to give a working example. [编辑]似乎有些混乱,因为我把上面的代码主要用来说明问题,而不是给出一个有效的例子。 Here's a minimal example that shows the namespace issue: 这是一个显示命名空间问题的最小示例:

# Example 1 : large_vector fits into memory
rm(large_vector)
tryCatch({
  large_vector = rep(5, 1000)
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
})
print(large_vector)  # all 5

# Example 2 : pretend large_vector does not fit into memory; solution using parent environment
rm(large_vector)
tryCatch({ 
  stop();  # simulate error
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
  assign('large_vector', large_vector, parent.env(environment()))
})
print(large_vector)  # all 3

# Example 3 : pretend large_vector does not fit into memory; namespace issue
rm(large_vector)
tryCatch({ 
  stop();  # simulate error
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
})
print(large_vector)  # does not exist

I would do something like this : 我会做这样的事情:

res <- tryCatch({
  large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
  ll <- lapply(split(data,seq(1,nrow(large_data_frame),1000)),
         function(x)
             small_vector = predict(model, x))
  return(ll)
})
rm(large_data_frame)
if(is.list(ll)) 
  res <- do.call(rbind,res)

The idea is to return a list of predictions results if you run out of the memory. 如果耗尽内存,我们的想法是返回预测结果列表。

NOTE, i am not sure of the result here, because we don't have a reproducible example. 注意,我不确定这里的结果,因为我们没有可重复的例子。

EDIT: Let's try again: 编辑:我们再试一次:

You can use finally argument of tryCatch : 你可以使用tryCatch finally参数:

step<-1000
n<-dim(large_data_frame)[1]
large_vector <- NULL
tryCatch({
  large_vector <- predict(model, large_data_frame) 
}, error = function(e) { # ran out of memory
  for (i in seq(from = 1, to = n, by = step)) {
    small_vector <- predict(model, large_data_frame[i:(i+step-1),]) #predict in pieces
    save(small_vector,file=paste0("tmpfile",i)) #same pieces
  }  
 rm(large_data_frame) #free memory

},finally={if(is.null(large_vector)){ #if we run out of memory
   large_vector<-numeric(n) #make vector
   for (i in seq(from = 1, to = n, by = step)){
     #collect pieces
     load(paste0("tmpfile",i)) 
     large_vector[i:(i+step-1)] <- small_vector
   }
}})

Here's a simplified version to see what is going on: 这是一个简化版本,可以看到发生了什么:

large_vector<-NULL
rm(y)
tryCatch({
  large_vector <- y 
}, error = function(e) {# y is not found
  print("error")
},finally={if(is.null(large_vector)){
 large_vector<-1
}})
> large_vector
[1] 1

EDIT2: Another tip regarding the scope which could be useful for you (although maybe not in this situation as you didn't want to declare large_vector beforehand): The <<- operator, from R-help: 编辑2:关于范围的另一个提示可能对你有用(虽然可能不是在这种情况下因为你不想事先声明large_vector ): <<-运算符,来自R-help:

The operators <<- and ->> are normally only used in functions, and cause a search to made through parent environments for an existing definition of the variable being assigned... 运算符<< - 和 - >>通常仅在函数中使用,并导致通过父环境进行搜索以查找正在分配的变量的现有定义...

Therefore you could use above example code like this: 因此,您可以使用上面的示例代码:

large_vector<-NULL
rm(y)
tryCatch({
  large_vector <- y 
}, error = function(e) {# y is not found
  large_vector <<- 1
  print("error")
})
> large_vector
[1] 1

The code below is quite self explanatory. 下面的代码非常自我解释。 Indeed the problem is that anything inside the error function is not by default applied to the parent environment. 实际上,问题是错误函数内的任何内容都不会默认应用于父环境。

b=0 B = 0

as explained, this doesn't work: 如上所述,这不起作用:

tryCatch(expr = {stop("error1")}, error=function(e) {b=1}) tryCatch(expr = {stop(“error1”)},error = function(e){b = 1})
b b

SOLUTION 1: assign to the parent environment 解决方案1:分配给父环境

tryCatch(expr = {stop("error2")}, error=function(e) {assign(x = "b", value = 2, envir = parent.env(env = environment()))}) tryCatch(expr = {stop(“error2”)},error = function(e){assign(x =“b”,value = 2,envir = parent.env(env = environment()))})
b b

SOLUTION 2: the most simple (only works if you are assigning to b in both expr and error ) 解决方案2:最简单(只有在exprerror中分配给b时才有效)

b = tryCatch(expr = {stop("error3")}, error=function(e) {b=3;return(b)}) b = tryCatch(expr = {stop(“error3”)},error = function(e){b = 3; return(b)})
b b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM