简体   繁体   English

如何向 R 中的函数提交“分布向量”?

[英]How to submit a "vector of distributions" to a function in R?

I want to write an R-function, say f , which has inputs x and n , where x is some kind of "list of distributions" and f is supposed to draw n samples from each distribution in x .我想写一个 R 函数,比如f ,它有输入xn ,其中x是某种“分布列表”, f应该从x中的每个分布中抽取n样本。

What is a good way to implement this in R?在 R 中实现这一点的好方法是什么?

My current idea is我目前的想法是

f = function(x,n){
  
  out = list()
  
  for(i in 1:length(x)){
    
    name = sub("\\(.*", "",x[i])
    size = ifelse(name=="sample",paste("size=",n),paste0("n=",n))
    args = paste(size,gsub("[\\(\\)]", "", regmatches(x[i], gregexpr("\\(.*?\\)", x[i]))[[1]]),sep=",")
    out[[i]] = eval(parse(text=paste0(name,"(",args,")")))
    
  }
  
  return(out)
  
}

f(x = c("rnorm(mean=1,sd=2)","sample(0:1,replace=TRUE)","rbinom(size=10,prob=0.1)"), n = 10)

I don't like this implementation, because我不喜欢这个实现,因为

  1. n is not always the input name for the sample size (eg in sample it is size ), n并不总是样本大小的输入名称(例如,在sample中它是size ),
  2. the code will crash if not all inputs for the distributions are properly defined.如果没有正确定义分布的所有输入,代码将崩溃。

Can I improve the implementation, for example with x of class alist?我可以改进实现,例如使用 alist 类的x吗?

You could change your input and create a list of function names and arguments.您可以更改输入并创建函数名称和参数列表。 For each distribution we set the n / size -value to 1 .对于每个分布,我们将n / size值设置为1

ls_func <- list("rnorm" = list(mean = 0, sd = 1, n = 1),
                "sample" = list(x = 0:1, replace = TRUE, size = 1),
                "rbinom" = list(size = 10, prob = 0.1, n = 1))

Your function takes those distributions and replicates them n -times:您的函数采用这些分布并将它们复制n

g <- function(ls_func, n) {
  out = list()
  
  for(i in seq_along(ls_func)){
    out[[i]] <- replicate(do.call(names(ls_func)[i], ls_func[[i]]), n = n)
  }
  
  return(out)
}

so所以

set.seed(4096)
g(ls_func, 10)

returns返回

[[1]]
 [1]  0.1894398 -0.1622468  0.5327100 -1.5747229 -0.6884024 -0.3092226 -0.0879258 -0.4320240 -0.7799596  0.4525895

[[2]]
 [1] 0 1 0 0 0 1 1 1 0 0

[[3]]
 [1] 0 0 1 1 0 1 1 1 1 0

or.要么。 Basically it's not a good approach to use eval(parse(text=...)) to execute functions.基本上,使用eval(parse(text=...))来执行函数并不是一个好方法。 Use do.call instead.改用do.call


You can remove the for -loop:您可以删除for循环:

g <- function(ls_func, n) {
  out = list()
 
  out <- lapply(seq_along(ls_func), function(i) replicate(do.call(names(ls_func)[i], ls_func[[i]]), n = n))
  
  return(out)
}

Note: This code also crashes, if your distributions aren't defined properly.注意:如果您的分布定义不正确,此代码也会崩溃。 To avoid this, you need some error handling.为了避免这种情况,您需要进行一些错误处理。 Look for try and stop functions.寻找trystop功能。

I've been putting together an R package -- distionary -- that can help with this.我一直在整理一个 R 包 - distionary - 可以帮助解决这个问题。

First make a list of input distributions:首先列出输入分布:

library(distionary)
x <- list(
  dst_norm(1, 2^2),
  dst_empirical(0:1),
  dst_binom(10, 0.1)
)

The function for drawing from a distribution is realize() , which fits nicely with lapply() (or purrr's map() ):从分布中绘制的函数是 implement( realize() ,它非常适合lapply() (或 purrr 的map() ):

set.seed(123)
lapply(x, realize, n = 10)
#> [[1]]
#>  [1] -0.1209513  0.5396450  4.1174166  1.1410168  1.2585755  4.4301300
#>  [7]  1.9218324 -1.5301225 -0.3737057  0.1086761
#> 
#> [[2]]
#>  [1] 0 0 0 0 0 0 0 0 1 1
#> 
#> [[3]]
#>  [1] 3 2 1 2 0 1 2 0 0 0

Putting this code in a function is then straightforward:将此代码放入函数中很简单:

f <- function(x, n) {
  lapply(x, realize, n = n)
}
set.seed(123)
f(x, n = 10)
#> [[1]]
#>  [1] -0.1209513  0.5396450  4.1174166  1.1410168  1.2585755  4.4301300
#>  [7]  1.9218324 -1.5301225 -0.3737057  0.1086761
#> 
#> [[2]]
#>  [1] 0 0 0 0 0 0 0 0 1 1
#> 
#> [[3]]
#>  [1] 3 2 1 2 0 1 2 0 0 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM