![](/img/trans.png)
[英]How to create a function that adds different categories to a table with tidyverse in R
[英]R: create a function in tidyverse
我有一些假数据:
library(tidyverse)
df <- data.frame(id = 1:20,
var1 = sample(c(0,1), size = 20, replace = T),
var2 = round(runif(20, min = 0, max = 100),0),
var3 = round(runif(20, min = 0, max = 100),0),
var4 = round(rnorm(20, mean = 50, sd = 20)),
var5 = sample(c(1:19, NA), size=20))
然后,我想对这些数据做一些检查:。 有错误和错误消息的行的 ID 应该放在 data.frame errors
。 我想使用管道运算符 %>% 调用 function
### Different checks
# There should be no missing values in var5
df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5")
# var3 should be greater than var4
df %>% filter(var3 < var4) %>% add_errors("var3 is smaller than var4")
# ... etc.
然后我必须定义 function add_errors()
:
### Define function
errors <- data.frame(id = numeric(), errormessage = character())
add_errors <- function(dat, error){
errors <<- add_case(errors, id = dat[['id']], errormessage = error)
}
我们可以在控制台上打印错误消息
add_errors <- function(dat, error) {
glue::glue("{error} at id: {toString(dat[['id']])}")
}
-测试
df %>%
filter(is.na(var5)) %>%
add_errors("There are NAs in var5")
#There are NAs in var5 at id: 6
df %>%
filter(var3 < var4) %>%
add_errors("var3 is smaller than var4")
#var3 is smaller than var4 at id: 1, 2, 3, 4, 6, 7, 8, 11, 15, 16, 17, 20
或返回带有错误消息的 tibble/data.frame output
add_errors <- function(dat, error) {
tibble(id = dat[['id']], errormessage = error)
}
df %>%
filter(is.na(var5)) %>%
add_errors("There are NAs in var5")
# A tibble: 1 x 2
# id errormessage
# <int> <chr>
#1 6 There are NAs in var5
一个选项是使用logger
,它可以更灵活地添加错误、警告、信息等以及时间戳
#remotes::install_github('daroczig/logger')
library(logger)
log_layout(layout_glue_colors)
t <- tempfile()
log_appender(appender_file(t))
log_info('Script starting up...')
df %>%
filter(is.na(var5)) %>%
{log_error('There are NAs in var5')}
df %>%
filter(var3 < var4) %>%
{log_error("var3 is smaller than var4")}
cat(readLines(t), sep="\n")
#INFO [2021-02-28 14:28:42] Script starting up...
#ERROR [2021-02-28 14:28:42] There are NAs in var5
#ERROR [2021-02-28 14:28:43] var3 is smaller than var4
unlink(t)
t
是临时文件,也可以写入自定义目标文件夹
以下代码执行与您所要求的类似的操作。 我尝试在不将错误数据帧作为参数传递的情况下执行此操作,但最终不会更改 function 之外的错误变量。
errors=data.frame(id=numeric(), errormessage=character())
add_errors=function(df, errormessage) {
return(bind_rows(errors, data.frame(id=df$id, errormessage=errormessage)))
}
errors=df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5")
errors=df %>% filter(var3 > var4) %>% add_errors("var3 is smaller than var4")
Output:
> print(errors)
id errormessage
1 3 There are NAs in var5
2 2 var3 is smaller than var4
3 3 var3 is smaller than var4
4 7 var3 is smaller than var4
5 8 var3 is smaller than var4
6 9 var3 is smaller than var4
7 12 var3 is smaller than var4
8 16 var3 is smaller than var4
9 18 var3 is smaller than var4
我知道这个问题是关于创建自定义 function 来检查错误。 但是有一个很好的 package 叫做 {pointblank} 正是为这种任务而设计的。
我们可以设置一个所谓的“代理”并“询问”它以获得一个不错的报告,而不是设置一个名为error
的data.frame
。 有几种替代工作流程可以检查软件包网站上描述的错误。 以下是在您的问题上使用 package 的一种可能方法。
library(dplyr)
library(pointblank)
df <- data.frame(id = 1:20,
var1 = sample(c(0,1), size = 20, replace = T),
var2 = round(runif(20, min = 0, max = 100),0),
var3 = round(runif(20, min = 0, max = 100),0),
var4 = round(rnorm(20, mean = 50, sd = 20)),
var5 = sample(c(1:19, NA), size=20))
agent <- df %>%
create_agent(
label = "My error checks",
actions = action_levels(stop_at = 1)
) %>%
col_vals_not_null(var5) %>%
col_vals_not_in_set(
vars(var3_lt_4),
preconditions = ~ . %>% dplyr::mutate(var3_lt_4 = var3 > var4),
set = FALSE) %>%
interrogate()
agent
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.