简体   繁体   中英

How to manage parameters and arguments in R?

This is a pretty simple and general question, but I haven't seen it already discussed. I hope I haven't missed anything.

I am starting to design big programs with several layers of functions, and while there is clear strategies in other programming languages, I can't find a canonical solution in R on how to treat "parameters" of a function that will also have "arguments". I make a conceptual difference between "parameters" and "arguments", even if they are actually the same to the function: inputs. The former will be set on higher level, and not change often, while the latter is the real data that the function will process.

Let consider this simple example: 简单模式

The subfunction of interest SF() is queried many times with different arguments by the "WORKER", but with the same parameters, that are set "above". Of course the same question applies to more complicated cases with several layers.

I see two ways of dealing with that: 1. Passing down everything, but : a. You'll end up with a myriad of arguments in your function call, or a structure enclosing all these arguments. b. Because R makes copies of arguments to call functions, it may not be very efficient. 2. Dynamically evaluating the functions each time you change the parameters, and "hardwire" them into the function definition. But I am not sure how to do that, especially in a clean way.

None of this seems really likable, so I was wondering if you guys had an opinion on that matter? Maybe we could use some environmental features of R? :-)

Thanks!

EDIT : Because for some, code is better than graphs, here is a dummy example in which I used the method "1.", passing all the arguments over. If I have many layers and subfunctions, passing all the parameters to the intermediate layers (here, WORKER()) seems not great. (from a code and a performance perspective)

F <- function(){
  param <- getParam()
  result <- WORKER(param)
  return(result)
}

getParam <- function(){
  return('O')
}

WORKER <- function(param) {
  X <- LETTERS[1:20]
  interm.result <- sapply(X,SF,param) # The use of sapply here negates maybe the performance issue?
  return(which(interm.result=='SO'))
}

SF <- function(x,param) {
  paste0(x,param)
}

EDIT 2 : The simplicity of the example above mislead some of the kind people looking at my problem, so here is a more concrete illustration, using a discrete gradient descent. Again, I kept it simple, so everything could be written in the same big function, but that's not what I want to do for my real problem.

gradientDescent <- function(initialPoint= 0.5, type = 'sin', iter_max = 100){ 
  point <- initialPoint
  iter <- 1
  E <- 3
  deltaError <- 1
  eprev <- 0
  while (abs(deltaError) > 10^(-2) | iter < iter_max) {
    v_points <- point + -100:100 / 1000
    E <- sapply(v_points, computeError, type)
    point <- v_points[which.min(E)]
    ef <- min(E)
    deltaError <- ef - eprev
    eprev <- ef
    iter <- iter+1
  }
  print(point)
  return(point)
}

computeError <- function(point, type) {
  if (type == 'sin') {
    e <- sin(point)
  } else if (type == 'cos') {
    e <- cos(point)    
  }
}

I find it non-optimal to pass the "type" parameter of the subfunction each time it is evaluated. It seem that the reference brought by @hadley to Closures and explanation of @Greg are good tracks to the solution I need.

I think you may be looking for lexical scoping. R uses lexical scoping which means that if you define the functions WORKER and SF inside of F, then they will be able to access the current value of param without it being passed down.

If you cannot take advantage of lexical scoping (SF must be defined outside of F), then another option is to create a new environment to store your parameters in, then if all the needed functions have access to this environment (either by passing explicitly, or by inheritance (making this environment the enclosing environment of the functions)) then F can assign param into this environment and the other functions can access the value.

At the risk of speaking for others, I think the reason your question is getting both interest and a dearth of answers is that you seem to be making this overcomplicated.

Certainly given the task shown in your example, I'd do something more like this:

SF <- function(x, par) {
    paste0(x, par)
}

F <- function(param) {
    which(sapply(LETTERS[1:20], FUN = SF, par = param) == "SO")
}

F(param="O")
#  S 
# 19 

Or, using the lexical scoping that Greg Snow referred to:

F <- function(param) {
    SF <- function(x) {
         paste0(x, param)
    }
    which(sapply(LETTERS[1:20], FUN = SF) == "SO")
}
F(param="O")

Or, in reality and taking advantage of the fact that paste0() is vectorized:

F <- function(param) {
    which(paste0(LETTERS[1:20], param) == "SO")
}
F("O")
# [1] 19

I understand my answer may appear overly simplistic: you clearly have something more complicated in mind, but I think you need to better show us what that is. To get more help I suggest you follow the suggestions in @baptiste's second comment, giving us a less abstract example and explaining why you call F() and getParam() without any arguments (and also perhaps demonstrating why you need a getParam() function at all).

Even though this question is greying, I thought it might be useful to touch on a couple other ways I've seen this kind of problem solved and provide an answer in the answer slot. Note that having seen and reported a pattern is not the same as endorsing it!

Closures

As mentioned in the comments and amended answer, closures certainly are a good answer here. That is, you can define a function in a function, ie a generator function and carry information from the generator function into the generated function.

generator_function <- function(param) {
  function() {
    param   
  }
}

generated_function <- generator_function(0)
generated_function()

In the context of the question, this might recommend defining computeError inside of gradientDecent , then computeError can carry type in its environment.

Once you grok closures, I think you'll find they are pretty powerful. However, they are a little challenging to think about at first. Moreover, if you aren't used to them, and the generated function ends up decoupled from the generator function's inputs, they can be a bit challenging to debug because confusion can arise as to what the value of type is and where it came from. To help the first problem, I heartily recommend pryr::unenclose . For the second, I'll let wiser minds than mine chime in if the need arises.

Set an option

Frequently raw parameters are set as an option (cf ?options ) either directly or through getter/setter functions (eg knitr ). However, I've also seen functions set as options as well now and again. Personally, I dislike this pattern because it is done rather inconsistently across packages and the options are usually buried in the documentation of specific functions, but your actual call might be to a higher level function when the option you need to make that higher level function do what you want might be buried in the docs for a lower order function.

...

Some authors avoid parameter spaghetti via liberal use of dots. It is dots all of the way down. This approach is pretty darn powerful. It just works 9 times out of 10. The disadvantage is that, at least for me, dots can be challenging to debug and document. On the debugging side, for example, because none of your functional inputs are strict, a mistyped parameter name seems hard to catch.

Other

There certainly are other patterns! Tons of them. People pass around environments and build lists, etc, etc. What answer is 'right' for you is probably a mix of your personal style, what works, and what will be clear to you when you go back and look at it months from now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM