[英]grepl across multiple columns in R
I have the following data which has na values (which R does not recognise) 我有以下数据有na值(R无法识别)
I am trying to remove these values using grepl
我试图使用grepl
删除这些值
x <- x[!grepl("n.a.", x$Fixed.assets.EUR.Last.avail..yr),]
but I am trying to apply it across all columns instead of specifying each column name and having many lines of text. 但我试图在所有列中应用它,而不是指定每个列名称和多行文本。
What I currently have is 我现在拥有的是
x <- sapply(x[, c(1:4)], !grepl("n.a."))
which produces errors and does not work. 这会产生错误而不起作用。
Error in match.fun(FUN) :
'!grepl("n.a.", x[, 1:4])' is not a function, character or symbol
Data 数据
dput(x)[1:6, ]
Fixed.assets.EUR.Last.avail..yr Fixed.assets.EUR.Year...1 Fixed.assets.EUR.Year...2
1 34,827,809 38,549,311 29,035,369
2 755,256 658,200 573,888
3 2,639,824 2,739,205 3,230,890
4 2,543,367 2,317,132 2,994,769
5 1,608,004 1,702,838 1,763,244
6 661,875 661,082 584,166
Fixed.assets.EUR.Year...3
1 30,416,099
2 n.a.
3 2,841,046
4 693,370
5 2,024,666
6 565,007
Let me start by saying that the best practice here would be to specify a na.strings = c("na")
argument when you read in your data. 首先我要说的是,这里最好的做法是在读入数据时指定一个na.strings = c("na")
参数。 That said, this is a way to use grepl()
to remove any row where you have na
as a string. 也就是说,这是一种使用grepl()
删除你有na
作为字符串的行的方法。
x[-which(apply(x[,1:4],1,function(y) any(grepl("n.a.",y, fixed=TRUE)))),]
If you want R to recognize "na" as NA values without removing the entire row (and hence losing real values across a row with an na value in only one column), you can use this: 如果您希望R在不删除整行的情况下将“na”识别为NA值(因此在一行中仅在一列中使用na值丢失实际值),则可以使用:
df[df=="n.a."] <- NA
Otherwise, you are better off using @Mako212's solution. 否则,最好使用@ Mako212的解决方案。
Here are 2 alternative options 以下是2种备选方案
Example Data 示例数据
set.seed(1)
df <- as.data.frame(matrix(sample(c("n.a.", "good"), 20, replace=TRUE), ncol=2, byrow=TRUE))
head(df)
# V1 V2
# 1 n.a. n.a.
# 2 good good
# 3 n.a. good
# 4 good good
# 5 good n.a.
# 6 n.a. n.a.
Convert na
to NA
, then use complete.cases
将na
转换为NA
,然后使用complete.cases
data <- replace(df, df == "n.a.", NA)
data[complete.cases(data),]
# V1 V2
# 2 good good
# 4 good good
# 9 good good
Use rowSums
使用rowSums
df[rowSums(df == "n.a.") == 0,]
# V1 V2
# 2 good good
# 4 good good
# 9 good good
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.