简体   繁体   English

在 dplyr R 中使用 mutate 将不存在的列添加到数据框中

[英]Adding non existing columns to a data frame using mutate with across in dplyr R

I have a list of column names as follow,我有一个列名列表如下,

cols <- c('grade', 'score', 'status')

If the data frame doesn't have any of the columns in the cols vector, I want to add that column (values as NA) to the data frame using mutate and across.如果数据框在 cols 向量中没有任何列,我想使用 mutate 和 cross 将该列(值作为 NA)添加到数据框中。 How to do that?怎么做?

A base solution: base解决方案:

df[setdiff(cols, names(df))] <- NA

This command can be adapted for pipeline:此命令可适用于管道:

df %>%
  `[<-`(, setdiff(cols, names(.)), NA)

#   id score grade status
# 1  1    94    NA     NA
# 2  2    98    NA     NA
# 3  3    93    NA     NA
# 4  4    82    NA     NA
# 5  5    89    NA     NA

Data数据
set.seed(123)
df <- data.frame(id = 1:5, score = sample(80:100, 5))

A solution using dplyr::mutate()使用dplyr::mutate()的解决方案

Suppose that your data frame is diamonds .假设您的数据框是diamonds Then add a tibble data frame that has the same number of columns as the column names (ie three columns, in this MWE) to the original data frame (ie diamond here).然后在原始数据框(即此处的diamond )中添加一个列数与列名相同的小标题数据框(即在此 MWE 中为三列)。

To create a tibble that contains NA automatically自动创建包含NA的小标题

(Thanks to the comment by Darren Tsai ) (感谢Darren Tsai 的评论

To create a tibble that has the same number of columns as the column names, you can first create a matrix that has the same number of columns as the column names by matrix(ncol = length(cols)) , and second, transform it to a tibble data frame by as_tibble() and set the column names using .name_repair = ~ cols inside of as_tibble() .要创建与列名具有相同列数的小标题,您可以首先通过matrix matrix(ncol = length(cols))创建一个列数与列名相同的矩阵,然后将其转换为as_tibble() 的 tibble 数据框,并在as_tibble() as_tibble()使用.name_repair = ~ cols设置列名。

The value of each column of the tibble is logical NA , when the matrix is created.创建矩阵时,tibble 的每一列的值都是逻辑NA Note but you may prefer one of NA_integer_ , NA_real_ , NA_complex_ , or NA_character_ over NA , if you want mutate these newly added columns later on as integer columns, numeric columns, complex columns (eg 1 + 5i), and character columns, respectively.请注意,但您可能更喜欢NA_integer_NA_real_NA_complex_NA_character_而不是NA ,如果您希望稍后将这些新添加的列分别变异为integer列、数字列、复杂列(例如 1 + 5i)和字符列。 In such a case, you can mutate the tibble so that you can change the type of column.在这种情况下,您可以mutate以便您可以更改列的类型。

You can create such a tibble inside of mutate .您可以在mutate中创建这样的 tibble。

cols <- c('grade', 'score', 'status')

diamonds |>
  mutate(
    matrix(
      ncol = length(cols)
    ) |>
      as_tibble(
        .name_repair = ~ cols
      ) |>
      ## if you want to interpret the grade as `factor` type...
      mutate(
        grade = as.factor(grade)
      )
  )

## # A tibble: 53,940 × 13
##    carat cut       color clarity depth table price     x     y     z grade score
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <lgl>
##  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 NA    NA
##  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 NA    NA   
##  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31 NA    NA
##  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 NA    NA
##  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75 NA    NA
##  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 NA    NA
##  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47 NA    NA   
##  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53 NA    NA
##  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49 NA    NA
## 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39 NA    NA
## # … with 53,930 more rows, and 1 more variable: status <lgl>

To create the NA tibble that does not have any column that matches to the original data frame创建没有与原始数据框匹配的任何列的NA tibble

(Thanks to the comment by Julian ) (感谢朱利安的评论

To ensure that the columns are added to the original data frame only if the original data frame doesn't have any of the columns in the cols vector, you have to select the columns of the NA tibble data frame that are not present in the original data frame.为确保仅当原始数据帧在cols向量中没有任何列时才将列添加到原始数据帧,您必须 select 原始数据帧中不存在的NA tibble 数据帧的列数据框。 You can do that by using !select(matches(colnames(diamonds))) .您可以使用!select(matches(colnames(diamonds)))来做到这一点。

cols <- c("grade", "price", "status")

matrix(ncol = length(cols)) |>
  as_tibble(
    .name_repair = ~ cols
  ) |>
  mutate(
    grade = as.factor(grade)
  )

diamonds |>
  mutate(
    matrix(
      ncol = length(cols)
    ) |>
      as_tibble(
        .name_repair = ~cols
      ) |>
      ## if you want to interpret the grade as `factor` type...
      mutate(
        grade = as.factor(grade)
      ) |>
      ## select columns that are not present in the original data frame 
      dplyr::select(
        !matches(colnames(diamonds))
      )
  )

## # A tibble: 53,940 × 12
##    carat cut      color clarity depth table price     x     y     z grade status
##    <dbl> <ord>    <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <lgl> 
##  1  0.23 Ideal    E     SI2      61.5    55   326  3.95  3.98  2.43 NA    NA
##  2  0.21 Premium  E     SI1      59.8    61   326  3.89  3.84  2.31 NA    NA
##  3  0.23 Good     E     VS1      56.9    65   327  4.05  4.07  2.31 NA    NA    
##  4  0.29 Premium  I     VS2      62.4    58   334  4.2   4.23  2.63 NA    NA
##  5  0.31 Good     J     SI2      63.3    58   335  4.34  4.35  2.75 NA    NA
##  6  0.24 Very Go… J     VVS2     62.8    57   336  3.94  3.96  2.48 NA    NA
##  7  0.24 Very Go… I     VVS1     62.3    57   336  3.95  3.98  2.47 NA    NA
##  8  0.26 Very Go… H     SI1      61.9    55   337  4.07  4.11  2.53 NA    NA    
##  9  0.22 Fair     E     VS2      65.1    61   337  3.87  3.78  2.49 NA    NA
## 10  0.23 Very Go… H     VS1      59.4    61   338  4     4.05  2.39 NA    NA
## # … with 53,930 more rows
df <- data.frame(grade = c("A", "B", "C"),
                 score = c(1, 2, 3))

cols <- c('grade', 'score', 'status')

for (i in cols){
    if (!(i %in% colnames(df))){
        df[i] <- NA
    }
}

> df
  grade score status
1     A     1     NA
2     B     2     NA
3     C     3     NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM