[英]Adding non existing columns to a data frame using mutate with across in dplyr R
I have a list of column names as follow,我有一个列名列表如下,
cols <- c('grade', 'score', 'status')
If the data frame doesn't have any of the columns in the cols vector, I want to add that column (values as NA) to the data frame using mutate and across.如果数据框在 cols 向量中没有任何列,我想使用 mutate 和 cross 将该列(值作为 NA)添加到数据框中。 How to do that?
怎么做?
A base
solution: base
解决方案:
df[setdiff(cols, names(df))] <- NA
This command can be adapted for pipeline:此命令可适用于管道:
df %>%
`[<-`(, setdiff(cols, names(.)), NA)
# id score grade status
# 1 1 94 NA NA
# 2 2 98 NA NA
# 3 3 93 NA NA
# 4 4 82 NA NA
# 5 5 89 NA NA
set.seed(123)
df <- data.frame(id = 1:5, score = sample(80:100, 5))
dplyr::mutate()
dplyr::mutate()
的解决方案Suppose that your data frame is diamonds
.假设您的数据框是
diamonds
。 Then add a tibble data frame that has the same number of columns as the column names (ie three columns, in this MWE) to the original data frame (ie diamond
here).然后在原始数据框(即此处的
diamond
)中添加一个列数与列名相同的小标题数据框(即在此 MWE 中为三列)。
NA
automaticallyNA
的小标题(Thanks to the comment by Darren Tsai ) (感谢Darren Tsai 的评论)
To create a tibble that has the same number of columns as the column names, you can first create a matrix
that has the same number of columns as the column names by matrix(ncol = length(cols))
, and second, transform it to a tibble data frame by as_tibble()
and set the column names using .name_repair = ~ cols
inside of as_tibble()
.要创建与列名具有相同列数的小标题,您可以首先通过
matrix
matrix(ncol = length(cols))
创建一个列数与列名相同的矩阵,然后将其转换为as_tibble() 的 tibble 数据框,并在as_tibble()
as_tibble()
使用.name_repair = ~ cols
设置列名。
The value of each column of the tibble is logical NA
, when the matrix is created.创建矩阵时,tibble 的每一列的值都是逻辑
NA
。 Note but you may prefer one of NA_integer_
, NA_real_
, NA_complex_
, or NA_character_
over NA
, if you want mutate these newly added columns later on as integer
columns, numeric columns, complex columns (eg 1 + 5i), and character columns, respectively.请注意,但您可能更喜欢
NA_integer_
、 NA_real_
、 NA_complex_
或NA_character_
而不是NA
,如果您希望稍后将这些新添加的列分别变异为integer
列、数字列、复杂列(例如 1 + 5i)和字符列。 In such a case, you can mutate
the tibble so that you can change the type of column.在这种情况下,您可以
mutate
以便您可以更改列的类型。
You can create such a tibble inside of mutate
.您可以在
mutate
中创建这样的 tibble。
cols <- c('grade', 'score', 'status')
diamonds |>
mutate(
matrix(
ncol = length(cols)
) |>
as_tibble(
.name_repair = ~ cols
) |>
## if you want to interpret the grade as `factor` type...
mutate(
grade = as.factor(grade)
)
)
## # A tibble: 53,940 × 13
## carat cut color clarity depth table price x y z grade score
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <lgl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 NA NA
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 NA NA
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 NA NA
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 NA NA
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 NA NA
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 NA NA
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 NA NA
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 NA NA
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 NA NA
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 NA NA
## # … with 53,930 more rows, and 1 more variable: status <lgl>
NA
tibble that does not have any column that matches to the original data frameNA
tibble (Thanks to the comment by Julian ) (感谢朱利安的评论)
To ensure that the columns are added to the original data frame only if the original data frame doesn't have any of the columns in the cols
vector, you have to select the columns of the NA
tibble data frame that are not present in the original data frame.为确保仅当原始数据帧在
cols
向量中没有任何列时才将列添加到原始数据帧,您必须 select 原始数据帧中不存在的NA
tibble 数据帧的列数据框。 You can do that by using !select(matches(colnames(diamonds)))
.您可以使用
!select(matches(colnames(diamonds)))
来做到这一点。
cols <- c("grade", "price", "status")
matrix(ncol = length(cols)) |>
as_tibble(
.name_repair = ~ cols
) |>
mutate(
grade = as.factor(grade)
)
diamonds |>
mutate(
matrix(
ncol = length(cols)
) |>
as_tibble(
.name_repair = ~cols
) |>
## if you want to interpret the grade as `factor` type...
mutate(
grade = as.factor(grade)
) |>
## select columns that are not present in the original data frame
dplyr::select(
!matches(colnames(diamonds))
)
)
## # A tibble: 53,940 × 12
## carat cut color clarity depth table price x y z grade status
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <lgl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 NA NA
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 NA NA
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 NA NA
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 NA NA
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 NA NA
## 6 0.24 Very Go… J VVS2 62.8 57 336 3.94 3.96 2.48 NA NA
## 7 0.24 Very Go… I VVS1 62.3 57 336 3.95 3.98 2.47 NA NA
## 8 0.26 Very Go… H SI1 61.9 55 337 4.07 4.11 2.53 NA NA
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 NA NA
## 10 0.23 Very Go… H VS1 59.4 61 338 4 4.05 2.39 NA NA
## # … with 53,930 more rows
df <- data.frame(grade = c("A", "B", "C"),
score = c(1, 2, 3))
cols <- c('grade', 'score', 'status')
for (i in cols){
if (!(i %in% colnames(df))){
df[i] <- NA
}
}
> df
grade score status
1 A 1 NA
2 B 2 NA
3 C 3 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.