[英]Removing columns from a data frame with repeated values
我有以下包含字符和數字的數據框,以及 NA:
df <- data.frame(a=c("notfound","NOT FOUND","NOT FOUND"), b=c(NA,"NOT FOUND","NOT FOUND"), c=c("not found",2,3), d=c("not found","NOT FOUND","NOT FOUND"), e=c("234","NOT FOUND",NA))
abcde 1 notfound <NA> not found not found 234 2 NOT FOUND NOT FOUND 2 NOT FOUND NOT FOUND 3 NOT FOUND NOT FOUND 3 NOT FOUND <NA>
我想刪除所有條目“未找到”、“未找到”、“未找到”、“未找到”的所有列。 基本上如果tolower(gsub(" ","",df)=="notfound")
。 似乎此操作不適用於數據幀。 有沒有其他選擇?
所需的輸出是:
de 1 not found 234 2 2 NOT FOUND 3 3 <NA>
您可以使用帶有正則表達式的grepl
來搜索與該表達式匹配的字符串,並僅保留某些元素不顯示匹配項的那些列(由FALSE
grepl
輸出指示),以便該列的匹配項數小於nrow(df)
。 此模式匹配以“not”開頭並以“found”結尾的字符串,並且grepl
設置為不區分大小寫。
is_nf <-
sapply(df, grepl, pattern = '(?=^not).*found$',
perl = TRUE, ignore.case = TRUE)
df[colSums(is_nf) < nrow(df)]
# b c e
# 1 <NA> not found 234
# 2 NOT FOUND 2 NOT FOUND
# 3 NOT FOUND 3 <NA>
我猜您還想刪除唯一非“未找到”是 NA 的列。
is_na <- is.na(df)
df[colSums(is_nf | is_na) < nrow(df)]
# c e
# 1 not found 234
# 2 2 NOT FOUND
# 3 3 <NA>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.