简体   繁体   English

从具有重复值的数据框中删除列

[英]Removing columns from a data frame with repeated values

I have the following data frame containing characters and numbers, and NA:我有以下包含字符和数字的数据框,以及 NA:

df <- data.frame(a=c("notfound","NOT FOUND","NOT FOUND"), b=c(NA,"NOT FOUND","NOT FOUND"), c=c("not found",2,3), d=c("not   found","NOT FOUND","NOT FOUND"), e=c("234","NOT FOUND",NA))
 abcde 1 notfound <NA> not found not found 234 2 NOT FOUND NOT FOUND 2 NOT FOUND NOT FOUND 3 NOT FOUND NOT FOUND 3 NOT FOUND <NA>

I would like to remove all the columns where all the entries are "not found", "NOT found", "NOT FOUND" "notfound".我想删除所有条目“未找到”、“未找到”、“未找到”、“未找到”的所有列。 basically if tolower(gsub(" ","",df)=="notfound") .基本上如果tolower(gsub(" ","",df)=="notfound") It seems like this operation does not work on data frames.似乎此操作不适用于数据帧。 Are there any alternatives?有没有其他选择?

The desired output would be:所需的输出是:

 de 1 not found 234 2 2 NOT FOUND 3 3 <NA>

You can use grepl with a regular expression to search for strings matching that expression and keep only those columns where some elements don't show a match (indicated by FALSE grepl output) so that the number of matches for that column is less than nrow(df) .您可以使用带有正则表达式的grepl来搜索与该表达式匹配的字符串,并仅保留某些元素不显示匹配项的那些列(由FALSE grepl输出指示),以便该列的匹配项数小于nrow(df) This pattern matches strings that start with "not" and end with "found", and grepl is set to be case-insensitive.此模式匹配以“not”开头并以“found”结尾的字符串,并且grepl设置为不区分大小写。

is_nf <- 
  sapply(df, grepl, pattern = '(?=^not).*found$', 
         perl = TRUE, ignore.case = TRUE)


df[colSums(is_nf) < nrow(df)]
#           b         c         e
# 1      <NA> not found       234
# 2 NOT FOUND         2 NOT FOUND
# 3 NOT FOUND         3      <NA>

I'm guessing you'd also want to remove columns where the only non "not found" is NA.我猜您还想删除唯一非“未找到”是 NA 的列。

is_na <- is.na(df)

df[colSums(is_nf | is_na) < nrow(df)]
#           c         e
# 1 not found       234
# 2         2 NOT FOUND
# 3         3      <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM