简体   繁体   English

R lapply:检查数据框是否包含列。 如果没有,请创建此列

[英]R lapply: check if data frame contains a column. If not, create this column

I have a list of dataframes. 我有一个数据帧列表。

I would like to check every column name of the dataframes. 我想检查数据帧的每个列名称。 If the column name is missing, I want to create this column to the dataframe, and complete with NA values. 如果缺少列名,我想将此列创建到数据帧,并使用NA值完成。

Dummy data: 虚拟数据:

d1 <- data.frame(a=1:2, b=2:3, c=4:5)
d2 <- data.frame(a=1:2, b=2:3)

l<-list(d1, d2)

# Check the columns names of the dataframes 
# If column is missing, add new column, add NA as values 
lapply(l, function(x) if(!("c" %in% colnames(x))) 
             {
              c<-rep(NA, nrow(x))
              cbind(x, c) # does not work!
              })

What I get: 我得到了什么:

[[1]]
NULL

[[2]]
  a b  c
1 1 2 NA
2 2 3 NA

What I want instead: 我想要的是:

[[1]]
  a b c
1 1 2 4
2 2 3 5

[[2]]
  a b c
1 1 2 NA
2 2 3 NA

Thanks for your help! 谢谢你的帮助!

You could use dplyr::mutate with an ifelse : 您可以将dplyr::mutateifelse一起使用:

library(dplyr)
lapply(l, function(x) mutate(x, c = ifelse("c" %in% names(x), c, NA)))

[[1]]
  a b c
1 1 2 4
2 2 3 4

[[2]]
  a b  c
1 1 2 NA
2 2 3 NA

One way is to use dplyr::bind_rows to bind data.frame s in the list and fill entries from missing columns with NA , and then split the resulting data.frame again to produce a list of data.frame s: 一种方法是使用dplyr::bind_rows绑定list data.frame并使用NA填充缺少列的条目,然后再次拆分生成的data.frame以生成data.framelist

df <- dplyr::bind_rows(l, .id = "id");
lapply(split(df, df$id), function(x) x[, -1])
#$`1`
#  a b c
#1 1 2 4
#2 2 3 5
#
#$`2`
#  a b  c
#3 1 2 NA
#4 2 3 NA

Or the same as a tidyverse / magrittr chain 或者与tidyverse / magrittr链相同

bind_rows(l, .id = "id") %>% split(., .$id) %>% lapply(function(x) x[, -1])

You have some good answers, but if you want to stick to base R: 你有一些很好的答案,但如果你想坚持基础R:

lapply(l, function(x) 
  if(!("c" %in% colnames(x))) {
  c<-rep(NA, nrow(x))
  return(cbind(x, c))
}
else(return(x))
)

Your code was returning NULL for the first df because you had no else statement to handle the case of c existing (ie FALSE in the if statement). 您的代码为第一个df返回NULL ,因为您没有else语句来处理c存在的情况(即if语句中为FALSE )。

library(purrr)

map(l, ~{if(!length(.x$c)) .x$c <- NA; .x})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM