简体   繁体   English

R:在tidyverse中创建一个function

[英]R: create a function in tidyverse

I have some fake data:我有一些假数据:

library(tidyverse)
df <- data.frame(id = 1:20,
                 var1 = sample(c(0,1), size = 20, replace = T),
                 var2 = round(runif(20, min = 0, max = 100),0),
                 var3 = round(runif(20, min = 0, max = 100),0),
                 var4 = round(rnorm(20, mean = 50, sd = 20)),
                 var5 = sample(c(1:19, NA), size=20))

Then, I would like to do some checks on these data:.然后,我想对这些数据做一些检查:。 The IDs of the rows that have errors and an error message should be put in a data.frame errors .有错误和错误消息的行的 ID 应该放在 data.frame errors I would like to call the function using the pipe-operator %>%我想使用管道运算符 %>% 调用 function

### Different checks

# There should be no missing values in var5
df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5")

# var3 should be greater than var4
df %>% filter(var3 < var4) %>% add_errors("var3 is smaller than var4")

# ... etc.

Then I have to define the function add_errors() :然后我必须定义 function add_errors()

### Define function

errors <- data.frame(id = numeric(), errormessage = character())

add_errors <- function(dat, error){
    errors <<- add_case(errors, id = dat[['id']], errormessage = error)
}

We could either print the error message on the console我们可以在控制台上打印错误消息

add_errors <- function(dat, error) {
    glue::glue("{error} at id: {toString(dat[['id']])}")
   }

-testing -测试

df %>%
    filter(is.na(var5)) %>% 
    add_errors("There are NAs in var5")
#There are NAs in var5 at id: 6

df %>%
   filter(var3 < var4) %>%
   add_errors("var3 is smaller than var4")
#var3 is smaller than var4 at id: 1, 2, 3, 4, 6, 7, 8, 11, 15, 16, 17, 20

Or return a tibble/data.frame with error message as output或返回带有错误消息的 tibble/data.frame output

add_errors <- function(dat, error) {
     tibble(id = dat[['id']], errormessage = error)
    }
    
df %>%
     filter(is.na(var5)) %>% 
     add_errors("There are NAs in var5")
# A tibble: 1 x 2
#     id errormessage         
#  <int> <chr>                
#1     6 There are NAs in var5

An option is to make use of logger which would make it more flexible to add error, warning, info etc. along with the timestamp一个选项是使用logger ,它可以更灵活地添加错误、警告、信息等以及时间戳

#remotes::install_github('daroczig/logger')
library(logger)
log_layout(layout_glue_colors)
t <- tempfile()
log_appender(appender_file(t))
log_info('Script starting up...')

df %>%
     filter(is.na(var5)) %>%
    {log_error('There are NAs in var5')}
    
df %>%
   filter(var3 < var4) %>%
   {log_error("var3 is smaller than var4")}
cat(readLines(t), sep="\n")
#INFO [2021-02-28 14:28:42] Script starting up...
#ERROR [2021-02-28 14:28:42] There are NAs in var5
#ERROR [2021-02-28 14:28:43] var3 is smaller than var4

unlink(t)

The t is a temporary file, which can also be written into a custom destination folder t是临时文件,也可以写入自定义目标文件夹

The following code does something similar to what you are asking.以下代码执行与您所要求的类似的操作。 I tried doing it without passing the errors data frame as an argument, but it doesn't end up changing the errors variable outside of the function.我尝试在不将错误数据帧作为参数传递的情况下执行此操作,但最终不会更改 function 之外的错误变量。

errors=data.frame(id=numeric(), errormessage=character())
add_errors=function(df, errormessage) {
    return(bind_rows(errors, data.frame(id=df$id, errormessage=errormessage)))
}
errors=df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5") 
errors=df %>% filter(var3 > var4) %>% add_errors("var3 is smaller than var4")

Output: Output:

> print(errors)
  id              errormessage
1  3     There are NAs in var5
2  2 var3 is smaller than var4
3  3 var3 is smaller than var4
4  7 var3 is smaller than var4
5  8 var3 is smaller than var4
6  9 var3 is smaller than var4
7 12 var3 is smaller than var4
8 16 var3 is smaller than var4
9 18 var3 is smaller than var4

I know that this question is about creating a custom function to check for errors.我知道这个问题是关于创建自定义 function 来检查错误。 But there is a nice package called {pointblank} which is exactly made for this kind of task.但是有一个很好的 package 叫做 {pointblank} 正是为这种任务而设计的。

Instead of setting up a data.frame called error , we can set up an so called "agent" and "interrogate" it to get a nice report.我们可以设置一个所谓的“代理”并“询问”它以获得一个不错的报告,而不是设置一个名为errordata.frame There are several alternative workflows to check for errors which are described on the package's website .有几种替代工作流程可以检查软件包网站上描述的错误。 Below is one possible way to use the package on your problem.以下是在您的问题上使用 package 的一种可能方法。

library(dplyr)
library(pointblank)

df <- data.frame(id = 1:20,
                 var1 = sample(c(0,1), size = 20, replace = T),
                 var2 = round(runif(20, min = 0, max = 100),0),
                 var3 = round(runif(20, min = 0, max = 100),0),
                 var4 = round(rnorm(20, mean = 50, sd = 20)),
                 var5 = sample(c(1:19, NA), size=20))
agent <- df %>%
  create_agent(
    label = "My error checks",
    actions = action_levels(stop_at = 1)
  ) %>%
  col_vals_not_null(var5) %>% 
  col_vals_not_in_set(
    vars(var3_lt_4),
    preconditions = ~ . %>% dplyr::mutate(var3_lt_4 = var3 > var4),
    set = FALSE) %>% 
  interrogate()
  
agent

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM