繁体   English   中英

R:在tidyverse中创建一个function

[英]R: create a function in tidyverse

我有一些假数据:

library(tidyverse)
df <- data.frame(id = 1:20,
                 var1 = sample(c(0,1), size = 20, replace = T),
                 var2 = round(runif(20, min = 0, max = 100),0),
                 var3 = round(runif(20, min = 0, max = 100),0),
                 var4 = round(rnorm(20, mean = 50, sd = 20)),
                 var5 = sample(c(1:19, NA), size=20))

然后,我想对这些数据做一些检查:。 有错误和错误消息的行的 ID 应该放在 data.frame errors 我想使用管道运算符 %>% 调用 function

### Different checks

# There should be no missing values in var5
df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5")

# var3 should be greater than var4
df %>% filter(var3 < var4) %>% add_errors("var3 is smaller than var4")

# ... etc.

然后我必须定义 function add_errors()

### Define function

errors <- data.frame(id = numeric(), errormessage = character())

add_errors <- function(dat, error){
    errors <<- add_case(errors, id = dat[['id']], errormessage = error)
}

我们可以在控制台上打印错误消息

add_errors <- function(dat, error) {
    glue::glue("{error} at id: {toString(dat[['id']])}")
   }

-测试

df %>%
    filter(is.na(var5)) %>% 
    add_errors("There are NAs in var5")
#There are NAs in var5 at id: 6

df %>%
   filter(var3 < var4) %>%
   add_errors("var3 is smaller than var4")
#var3 is smaller than var4 at id: 1, 2, 3, 4, 6, 7, 8, 11, 15, 16, 17, 20

或返回带有错误消息的 tibble/data.frame output

add_errors <- function(dat, error) {
     tibble(id = dat[['id']], errormessage = error)
    }
    
df %>%
     filter(is.na(var5)) %>% 
     add_errors("There are NAs in var5")
# A tibble: 1 x 2
#     id errormessage         
#  <int> <chr>                
#1     6 There are NAs in var5

一个选项是使用logger ,它可以更灵活地添加错误、警告、信息等以及时间戳

#remotes::install_github('daroczig/logger')
library(logger)
log_layout(layout_glue_colors)
t <- tempfile()
log_appender(appender_file(t))
log_info('Script starting up...')

df %>%
     filter(is.na(var5)) %>%
    {log_error('There are NAs in var5')}
    
df %>%
   filter(var3 < var4) %>%
   {log_error("var3 is smaller than var4")}
cat(readLines(t), sep="\n")
#INFO [2021-02-28 14:28:42] Script starting up...
#ERROR [2021-02-28 14:28:42] There are NAs in var5
#ERROR [2021-02-28 14:28:43] var3 is smaller than var4

unlink(t)

t是临时文件,也可以写入自定义目标文件夹

以下代码执行与您所要求的类似的操作。 我尝试在不将错误数据帧作为参数传递的情况下执行此操作,但最终不会更改 function 之外的错误变量。

errors=data.frame(id=numeric(), errormessage=character())
add_errors=function(df, errormessage) {
    return(bind_rows(errors, data.frame(id=df$id, errormessage=errormessage)))
}
errors=df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5") 
errors=df %>% filter(var3 > var4) %>% add_errors("var3 is smaller than var4")

Output:

> print(errors)
  id              errormessage
1  3     There are NAs in var5
2  2 var3 is smaller than var4
3  3 var3 is smaller than var4
4  7 var3 is smaller than var4
5  8 var3 is smaller than var4
6  9 var3 is smaller than var4
7 12 var3 is smaller than var4
8 16 var3 is smaller than var4
9 18 var3 is smaller than var4

我知道这个问题是关于创建自定义 function 来检查错误。 但是有一个很好的 package 叫做 {pointblank} 正是为这种任务而设计的。

我们可以设置一个所谓的“代理”并“询问”它以获得一个不错的报告,而不是设置一个名为errordata.frame 有几种替代工作流程可以检查软件包网站上描述的错误。 以下是在您的问题上使用 package 的一种可能方法。

library(dplyr)
library(pointblank)

df <- data.frame(id = 1:20,
                 var1 = sample(c(0,1), size = 20, replace = T),
                 var2 = round(runif(20, min = 0, max = 100),0),
                 var3 = round(runif(20, min = 0, max = 100),0),
                 var4 = round(rnorm(20, mean = 50, sd = 20)),
                 var5 = sample(c(1:19, NA), size=20))
agent <- df %>%
  create_agent(
    label = "My error checks",
    actions = action_levels(stop_at = 1)
  ) %>%
  col_vals_not_null(var5) %>% 
  col_vals_not_in_set(
    vars(var3_lt_4),
    preconditions = ~ . %>% dplyr::mutate(var3_lt_4 = var3 > var4),
    set = FALSE) %>% 
  interrogate()
  
agent

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM