如何为返回数据框的函数编写单元测试的test

Question

I am writing a script that ultimately returns a data frame. 我正在编写一个最终返回数据框的脚本。 My question is around if there are any good practices on how to use a unit test package to make sure that the data frame that is returned is correct. 我的问题是，如果有任何关于如何使用单元测试包以确保返回的数据框是正确的良好实践。 (I'm a beginning R programmer, plus new to the concept of unit testing) （我是R程序员的开始，加上单元测试的新概念）

My script effectively looks like the following: 我的脚本实际上如下所示：

# initialize data frame
df.out <- data.frame(...)

# function set
function1 <- function(x) {...}
function2 <- function(x) {...}

# do something to this data frame
df.out$new.column <- function1(df.out)

# do something else
df.out$other.new.column <- function2(df.out)

# etc ....

... and I ultimately end up with a data frame with many new columns. ......最终我得到了一个包含许多新列的数据框。 However, what is the best approach to test that the data frame that is produced is what is anticipated, using unit tests? 但是，使用单元测试测试生成的数据帧是预期的最佳方法是什么？

So far I have created unit tests that check the results of each function, but I want to make sure that running all of these together produces what is intended. 到目前为止，我已经创建了单元测试来检查每个函数的结果，但我想确保将所有这些一起运行产生预期的结果。 I've looked at Hadley Wickham's page on testing but can't see anything obvious regarding what to do when returning data frames. 我查看了Hadley Wickham关于测试的页面，但是在返回数据帧时看不出有什么明显的事情要做。

My thoughts to date are: 我的想法是：

Create an expected data frame by hand 手动创建预期的数据框
Check that the output equals this data frame, using expect_that or similar 使用expect_that或类似方法检查输出是否等于此数据帧

Any thoughts / pointers on where to look for guidance? 关于在哪里寻找指导的任何想法/指示？ My Google-fu has let me down considerably on this one to date. 到目前为止，我的谷歌已经让我失望了。

Answer 1

Your intuition seems correct. 你的直觉似乎是对的。 Construct a data.frame manually based on the expected output of the function and then compare that against the function's output. 根据函数的预期输出手动构造一个data.frame，然后将其与函数的输出进行比较。

# manually created data
dat <- iris[1:5, c("Species", "Sepal.Length")]

# function
myfun <- function(row, col, data) {
    data[row, col]
}

# result of applying function
outdat <- myfun(1:5, c("Species", "Sepal.Length"), iris)

# two versions of the same test
expect_true(identical(dat, outdat))
expect_identical(dat, outdat)

If your data.frame may not be identical , you could also run tests in parts of the data.frame, including: 如果您的data.frame可能不相同，您还可以在部分data.frame中运行测试，包括：

dim(outdat) , to check if the size is correct dim(outdat) ，检查大小是否正确
attributes(outdat) or attributes of columns attributes(outdat)或列的属性
sapply(outdat, class) , to check variable classes sapply(outdat, class) ，检查变量类
summary statistics for variables, if applicable 变量的摘要统计（如果适用）
and so forth 等等

Answer 2

If you would like to test this at runtime, you should check out the excellent ensurer package, see here . 如果您想在运行时测试它，您应该查看优秀的ensurer包，请参阅此处。 At the bottom of the page you can see how to construct a template that you can test your dataframe against, you can make it as detailed and specific as you like. 在页面底部，您可以看到如何构建可以测试数据框的模板，您可以根据需要进行详细和具体的模板制作。

Answer 3

I'm just using something like this 我只是用这样的东西

d1 <- iris
d2 <- iris 
expect_that(d1, equals(d2)) # passes
d3 <- iris
d3[141,3] <- 5
expect_that(d1, equals(d3)) # fails

如何为返回数据框的函数编写单元测试的test

问题描述

3 个解决方案

解决方案1
11 已采纳 2015-03-26 15:52:41

解决方案2
1 2015-04-23 07:02:58

解决方案3
0 2015-05-13 19:35:03

如何为返回数据框的函数编写单元测试的test

问题描述

3 个解决方案

解决方案1 11 已采纳 2015-03-26 15:52:41

解决方案2 1 2015-04-23 07:02:58

解决方案3 0 2015-05-13 19:35:03

解决方案1
11 已采纳 2015-03-26 15:52:41

解决方案2
1 2015-04-23 07:02:58

解决方案3
0 2015-05-13 19:35:03