R中的函数，用于验证data.frame上特定列的存在

Question

I'd like to validate that a data.frame contains columns with specific names. 我想验证data.frame包含具有特定名称的列。 Ideally this would be a utility function that I can just pass the data.frame and expected column names and the function will raise an error if the data.frame does not contain the expected columns. 理想情况下，这是一个实用函数，我可以只传递data.frame和预期的列名，如果data.frame不包含预期的列，则该函数将引发错误。 I have written my own function below, however, this seems like something that would already exist in the R ecosystem. 我在下面编写了自己的函数，但是，这似乎已经存在于R生态系统中。

My questions are: 我的问题是：

Does such a function (or one-liner) already exist either in base R or in a common package? 这样的功能（或单行）是否已经存在于base R或公共包中？
If not, any suggestions for my function (below)? 如果没有，对我的功能有什么建议（如下）？

Example of the function I have written to do this: 我为此编写的函数示例：

validate_df_columns <- function(df, columns) {
    chr_df <- deparse(substitute(df))
    chr_columns <- paste(columns, collapse = ", ")
    if (!('data.frame' %in% class(df))) {
        stop(paste("Argument", df, "must be a data.frame."))
    }
    if (sum(colnames(df) %in% columns) != length(columns)) {
        stop(paste(chr_df, "must contain the columns", chr_columns))
    }
}

validate_df_columns(data.frame(a=1:3, b=4:6), c("a", "b", "c'"))
## Error in validate_df_columns(data.frame(a = 1:3, b = 4:6), c("a", "b",  : 
##   data.frame(a = 1:3, b = 4:6) must contain the columns a, b, c'

Answer 1

The packages tibble and rlang , part of tidyverse have a function to check this : 包tibble和rlang的一部分， tidyverse有一个函数来检查这一点：

library(tibble) # or library(rlang) or library(tidyverse)
has_name(iris, c("Species","potatoe"))
# [1]  TRUE FALSE

Technically it lives in rlang and its code is just : 从技术上讲，它生活在rlang ，其代码如下：

function (x, name) 
{
    name %in% names2(x)
}

where rlang::names2 is an enhanced version of base::names which returns a vector of empty strings rather than NULL when the object doesn't have names. 其中rlang::names2是base::names的增强版本，当对象没有base::names ，它返回空字符串向量，而不是NULL 。

Here's a way to rewrite your function : 这是重写函数的一种方法：

validate_df_columns <- function(df, columns){
if (!is.data.frame(df)) {
    stop(paste("Argument", deparse(substitute(df)), "must be a data.frame."))
}
  if(!all(i <- rlang::has_name(df,columns)))
    stop(sprintf(
      "%s doesn't contain: %s",
      deparse(substitute(df)),
      paste(columns[!i], collapse=", ")))
}

validate_df_columns(iris, c("Species","potatoe","banana"))
# Error in validate_df_columns(iris, c("Species", "potatoe", "banana")) : 
# iris doesn't contain: potatoe, banana

Using deparse(substitute(...)) here makes little sense to me though, as it's not used interactively, clearer in my opinion to just say "df" . 不过，在这里使用deparse(substitute(...))对我来说意义不大，因为它不是交互式使用的，所以我认为只说"df"更清楚。

Answer 2

The %in% operator works with pairs of vectors, so there is already a one-liner we can use here. %in%运算符适用于成对的向量，因此我们已经可以在这里使用单线了。 Consider: 考虑：

df <- data.frame(a=c(1:3), b=c(4:6), c=c(7:9))
names <- c("a", "c", "blah", "doh")
names[names %in% names(df)]

[1] "a" "c"

If you want to assert that the data frame contains all the input names, then just use: 如果要断言数据框包含所有输入名称，则只需使用：

length(names %in% names(df)) == length(names)     # to check all inputs are present
length(names %in% names(df)) == length(names(df)) # to check that input matches df

R中的函数，用于验证data.frame上特定列的存在

问题描述

2 个解决方案

解决方案1
3 2018-11-05 15:44:41

解决方案2
0 2018-11-05 15:36:46

R中的函数，用于验证data.frame上特定列的存在

问题描述

2 个解决方案

解决方案1 3 2018-11-05 15:44:41

解决方案2 0 2018-11-05 15:36:46

解决方案1
3 2018-11-05 15:44:41

解决方案2
0 2018-11-05 15:36:46