简体   繁体   中英

R language Function Factory: How to ensure safety?

The Problem

I would like to check whether a function factory in R is "safe". Here "safe" means the results of functions created by the factory depend only on their arguments, not on Global Variables.

Description

This is an unsafe factory:

funfac_bad = function(){  
  newfun = function()
    return(foo)
  return(newfun)
}

The return value of newfun will depend on the value of foo at time of execution of newfun. It may even through an error if foo happens to be undefined.

Now - quite obviously - this factory can be made safe by binding foo to a value inside the factory

funfac_good = function(){
  foo = 4711
  newfun = function()
    return(foo)
  return(newfun)
}

I thought I could validate safety by checking for Global Variables in the factory. And indeed:

> codetools::findGlobals(funfac_bad) 
[1] "{"      "="      "foo"    "return"
> codetools::findGlobals(funfac_good)
[1] "{"      "="      "return"

But my actual use case is (much) more complicated. The functions of the factory depend on subfunctions and variables with hundreds of lines of code. Hence I sourced the definition and my factories in principle look like this:

funfac_my = function(){
  sys.source("file_foo.R", envir = environment())
  newfun = function()
    return(foo)
  return(newfun)
}

This is a safe factory if and only if code executed in "file_foo.R" binds the name "foo" to a value. But (quite logically) codetools::findGlobals will always report "foo" as global variable.

Question

How can I detect unsafe behaviour of such a function factory when definitions are sourced?

Why not just ensure you define a default value for foo locally before sourcing the external files? For example, suppose I have this file:

foo.R

foo <- "file foo"

and this file

bar.R

bar <- "bar"

If I write my function factory like this:

funfac_my <- function(my_path) {
  foo <- "fun fac foo"
  if(!missing(my_path)) sys.source(my_path, envir = environment())
  function() foo
}

Then I get the following results:

foo <- "global foo"

funfac_my("foo.R")()
#> [1] "file foo"

funfac_my("bar.R")()
#> [1] "fun fac foo"

funfac_my()()
#> [1] "fun fac foo"

So the output will simply never depend on whether there is an object in the global environment called "foo", (unless the scripts you are running maliciously look for a global called "foo" to copy - but then that would presumably be what you wanted by sourcing that file anyway)

Note that you could set this up to throw an error instead of returning a default value by adding the line if(foo == "fun fac foo") stop("object 'foo' not found") just before the final line. This will therefore complain that foo is not found even though you have a wrong object called foo in the global workspace.

You ask "How can I detect unsafe behaviour of such a function factory when definitions are sourced?" I think the answer is that you can't, but changing it slightly would make it easy.

For example, suppose currently you have

foo <- undefined_value

as the only line in "file_foo.R" , and you want to be warned about the use of undefined_value . My suggestion is that you don't do that. Instead, put the whole definition of funfac_my into "file_foo.R" , wrapping that one line:

funfac_my = function(){
 
  foo <- undefined_value

  newfun = function()
    return(foo)
  return(newfun)
}

Now you can source that file, and have a function funfac_my to pass to codetools::findGlobals :

codetools::findGlobals(funfac_my)
#> [1] "{"               "<-"              "="               "return"         
#> [5] "undefined_value"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM