[英]Negation of gsub | Replace everything except strings in a certain vector
I have a vector of strings: 我有一个字符串向量:
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
I want to keep only three possible values in this vector: N
, A
, and NA
. 我想在此向量中仅保留三个可能的值:
N
, A
和NA
。
Therefore, I want to replace any element that is NOT N
or A
with NA
. 因此,我想用
NA
替换任何非N
或A
元素。
How can I achieve this? 我怎样才能做到这一点?
I have tried the following: 我尝试过以下方法:
gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')
But these don't work well, because they replace every instance of "A" or "N" in every string with NA. 但这些效果不好,因为它们用NA替换每个字符串中的每个“A”或“N”实例。 So in some cases I end up with
NANANANANANA
, instead of simply NA
. 因此,在某些情况下,我最终会使用
NANANANANANA
,而不仅仅是NA
。
Use negative lookahead assertion. 使用否定先行断言。
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N" "A" "A" "A" "N" "NA" "NA" "NA" "NA" "N" "A" "NA" "NA" "NA" "NA"
^(?![NA]$)
asserts that ^(?![NA]$)
断言
-> after the start ^
there should be only one letter [NA]
either N
or A
which should be followed by line end $
. - >开始后
^
应该只有一个字母[NA]
N
或A
,后面应该是行尾$
。
.*
match all chars .*
匹配所有字符
So that above regex would match any string except the string is N
or A
因此,除了字符串是
N
或A
之外,上面的正则表达式将匹配任何字符串
If we are looking for fixed matches, then use %in%
with negation !
如果我们正在寻找固定的匹配,那么使用
%in%
with negation !
and assign it to 'NA'
并将其分配给
'NA'
ve[!ve %in% c("A", "N", "NA")] <- 'NA'
Note that in R
, missing value is unquoted NA
and not quoted. 请注意,在
R
,缺失值是未引用的NA
而不是引用。 Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing 希望它是一个不同的类别,并建议将类别名称更改为不同的名称,以避免将来解析时的混淆
Here is an alternative regex solution, slightly simpler and much faster than Avinash's 这是一个替代的正则表达式解决方案,比Avinash更简单,更快
ve[!grepl("^[N|A]$", ve)] <- NA_character_
You still probably should go with Akrun's solution which is "simple and straight-forward" and still faster. 你仍然应该选择Akrun的解决方案,这个解决方案“简单直接”,而且速度更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.