简体   繁体   English

R中测试统计功能的指南

[英]guidelines for testing a statistical function in R?

Question: I am testing functions in a package that I am developing and would like to know if you can suggest some general guidelines for how to do this. 问题:我正在测试正在开发的程序包中的功能,并想知道您是否可以就如何执行此操作提出一些一般性准则。 The functions include a large range of statistical modeling, transformations, subsetting, and plotting. 这些功能包括各种统计建模,转换,子集和绘图。 Is there a 'standard' or some sufficient test? 是否有“标准”或足够的测试?

An Example: the test that prompted me ask this question, 范例:测试提示我问这个问题,

The function dtheta: 函数dtheta:

dtheta <- function(x) {
  ## find the quantile of the mean
  q.mean <- mean(mean(x) >= x)
  ## find the quantiles of ucl and lcl (q.mean +/- 0.15)
  q.ucl  <- q.mean + 0.15
  q.lcl  <- q.mean - 0.15
  qs <- c(q.lcl, q.mean, q.ucl)
  ## find the lcl, mean, and ucl of the vector
  c(quantile(x,qs), var(x), sqrt(var(x))/mean(x))
}

Step 1: make test data: 步骤1:制作测试数据:

set.seed(100) # per Dirk's recommendation
test <- rnorm(100000,10,1)

Step 2: compare the expected output from the function with the actual output from the function: 步骤2:将功能的预期输出与功能的实际输出进行比较:

 expected <- quantile(test, c(0.35, 0.65, 0.5))
 actual   <- dtheta(test)[1:3]
 signif(expected,2) %in% signif(actual,2)

Step 3: maybe do another test 步骤3:也许再做一次测试

test2 <- runif(100000, 0, 100)
expected <- c(35, 50, 65)
actual   <- dtheta(test2)
expected %in% signif(actual,2)

Step 4: if true, consider function 'functional' 步骤4:如果为真,则考虑功能为“功能性”

It depends on what exactly you want to test. 这取决于您要测试的内容。 Next to Dirks recommendations, svUnit or the RUnit package VitoshKa mentioned, I'd like to add a few things : 除了Dirks的建议之外, svUnit提到了svUnitRUnit包VitoshKa,我想添加一些内容:

  • Indeed, set the seed, but make sure you try the function with different seeds as well. 确实,设置了种子,但请确保也尝试使用其他种子来执行该功能。 Some functions fail only once every ten times you try. 某些功能每十次尝试失败一次。 Especially when optimization is involved, this becomes crucial. 特别是在涉及优化时,这变得至关重要。 replicate() is a nice function to use in this context. replicate()是在此上下文中使用的很好的函数。
  • Think very well about the input you want to test. 考虑一下要测试的输入。 You should test a number of "odd" cases that don't really resemble the "perfect" dataset. 您应该测试许多与“完美”数据集完全不同的“奇数”案例。 I always test at least 10 (simulated) datasets of different sizes. 我总是至少测试10个(模拟的)不同大小的数据集。
  • Fool-proof the function: I also throw in some data types that are not the ones the function is meant for. 万无一失的功能:我还抛出了一些不是该功能所要使用的数据类型。 Wrong type input is likely going to happen at one point, and the last thing you want is a function returning a bogus result without a warning. 错误的类型输入很可能会在某一时刻发生,而您想要的最后一件事是一个函数在没有警告的情况下返回假结果。 If you use that function later on in some other code, debugging that code can and will! 如果您稍后在其他一些代码中使用该功能,则可以并且可以调试该代码! be hell. 地狱。 Been there, done that, bought the t-shirt... 到那里去做,买了这件T恤...

An example on extended testing of datasets: what would you like to see as output in these cases? 关于数据集扩展测试的示例:在这些情况下,您希望看到什么? Is this the result you'd expect? 这是您期望的结果吗? Not according to the test you did. 并非根据您所做的测试。

> test3 <- rep(12,100000) # data with only 1 value
> expected <- c(12, 12, 12)
> actual   <- dtheta(test3) 
Error in quantile.default(x, qs) : 'probs' outside [0,1]

>  test4 <- rbinom(100000,30,0.5) # large dataset with a limited amount of values
>  expected <- quantile(test4,c(0.35, 0.50, 0.65))
>  actual   <- dtheta(test4)
>  expected %in% signif(actual,2)
[1] FALSE  TRUE  TRUE

> test5 <- runif(100,0,100) # small dataset. 
> expected <- c(35, 50, 65)
> actual   <- dtheta(test5)
> expected %in% signif(actual,2)
[1] FALSE FALSE FALSE

edit : corrected code so tests are a bit more senseful. 编辑:更正的代码,因此测试更有意义。

You need to write 你需要写

  1. tests that show you get the right answer when you input sensible values 输入合理值的测试可显示正确答案

  2. tests that show your function fails correctly when you input nonsense. 当您输入废话时,表明您的功能的测试会正确失败。

  3. test for all boundary cases 测试所有边界情况

There is a huge amount of literature on different strategies for testing software; 关于测试软件的不同策略的文献很多。 Wikipedia's software testing page is as good a place as any to start. Wikipedia的软件测试页和任何开始的地方一样好。

Looking at your example: 看你的例子:

What happens when you input a string/dataframe/list? 输入字符串/数据框/列表时会发生什么?
What about negative x or imaginary x ? x或虚x呢?
How about vector/array x ? 向量/数组x怎么样?
If only positive x is allowed, then what happens at x = 0 ? 如果只允许正x ,那么x = 0时会发生什么?

Note that subfunctions (that are only called by your functions and never by the user) need less input checking because you have more control over what goes into the function. 请注意,子功能(仅由您的函数调用,而不会由用户调用)需要较少的输入检查,因为您可以更好地控制该函数的内容。

Nice question. 好问题。

Besides generalities such as setting a seed, I would recommend that you look at some of the tests in the R sources. 除了设置种子等一般性功能外,我建议您查看R源中的一些测试。 The directory tests/ in the source has a wealth of these; 源代码中的tests/目录中有很多。 some of the packages in R Base (such as tools) also have subdirectory tests/ . R Base中的某些软件包(例如工具)还具有子目录tests/

It's already appeared as a comment, but I'll add it as a bona fidey answer. 它已经作为评论出现,但是我将其添加为真实的答案。 R does have a few automated testing packages to help with this kind of thing, the main two being Runit and testthat . [R确实有一些自动化的测试包,以帮助这种事情,主要的两个是Runittestthat I've briefly used runit, and recently started using testthat in more depth (I can't really give any good advantages / disadvantages of one over another though !). 我曾经简短地使用过runit,最近又开始更深入地使用testthat(尽管我不能真正给出一个相对于另一个的任何优点/缺点!)。

Automated testing allows you to setup these test cases, as well as others as suggested above like; 自动化测试允许您设置这些测试用例以及上面建议的其他用例;例如,

  • Boundary Tests 边界测试
  • Stress Tests (less need to test for accuracy, just throw data at it and see if it falls over) 压力测试(无需测试准确性,只需向其扔数据并查看是否跌落)
  • Dealing with different input 处理不同的输入
  • Dealing with different underlying platforms / locales 处理不同的基础平台/语言环境

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM