简体   繁体   English

gre中跨越R的多个列

[英]grepl across multiple columns in R

I have the following data which has na values (which R does not recognise) 我有以下数据有na值(R无法识别)

I am trying to remove these values using grepl 我试图使用grepl删除这些值

x <- x[!grepl("n.a.", x$Fixed.assets.EUR.Last.avail..yr),]

but I am trying to apply it across all columns instead of specifying each column name and having many lines of text. 但我试图在所有列中应用它,而不是指定每个列名称和多行文本。

What I currently have is 我现在拥有的是

x <- sapply(x[, c(1:4)], !grepl("n.a."))

which produces errors and does not work. 这会产生错误而不起作用。

Error in match.fun(FUN) : 
  '!grepl("n.a.", x[, 1:4])' is not a function, character or symbol

Data 数据

dput(x)[1:6, ]
  Fixed.assets.EUR.Last.avail..yr Fixed.assets.EUR.Year...1 Fixed.assets.EUR.Year...2
1                      34,827,809                38,549,311                29,035,369
2                         755,256                   658,200                   573,888
3                       2,639,824                 2,739,205                 3,230,890
4                       2,543,367                 2,317,132                 2,994,769
5                       1,608,004                 1,702,838                 1,763,244
6                         661,875                   661,082                   584,166
  Fixed.assets.EUR.Year...3
1                30,416,099
2                      n.a.
3                 2,841,046
4                   693,370
5                 2,024,666
6                   565,007

Let me start by saying that the best practice here would be to specify a na.strings = c("na") argument when you read in your data. 首先我要说的是,这里最好的做法是在读入数据时指定一个na.strings = c("na")参数。 That said, this is a way to use grepl() to remove any row where you have na as a string. 也就是说,这是一种使用grepl()删除你有na作为字符串的行的方法。

x[-which(apply(x[,1:4],1,function(y) any(grepl("n.a.",y, fixed=TRUE)))),]

If you want R to recognize "na" as NA values without removing the entire row (and hence losing real values across a row with an na value in only one column), you can use this: 如果您希望R在不删除整行的情况下将“na”识别为NA值(因此在一行中仅在一列中使用na值丢失实际值),则可以使用:

df[df=="n.a."] <- NA

Otherwise, you are better off using @Mako212's solution. 否则,最好使用@ Mako212的解决方案。

Here are 2 alternative options 以下是2种备选方案

Example Data 示例数据

set.seed(1)
df <- as.data.frame(matrix(sample(c("n.a.", "good"), 20, replace=TRUE), ncol=2, byrow=TRUE))
head(df)

    # V1   V2
# 1 n.a. n.a.
# 2 good good
# 3 n.a. good
# 4 good good
# 5 good n.a.
# 6 n.a. n.a.

Convert na to NA , then use complete.cases na转换为NA ,然后使用complete.cases

data <- replace(df, df == "n.a.", NA)
data[complete.cases(data),]

    # V1   V2
# 2 good good
# 4 good good
# 9 good good

Use rowSums 使用rowSums

df[rowSums(df == "n.a.") == 0,]

    # V1   V2
# 2 good good
# 4 good good
# 9 good good

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM