I have a vector of strings:
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
I want to keep only three possible values in this vector: N
, A
, and NA
.
Therefore, I want to replace any element that is NOT N
or A
with NA
.
How can I achieve this?
I have tried the following:
gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')
But these don't work well, because they replace every instance of "A" or "N" in every string with NA. So in some cases I end up with NANANANANANA
, instead of simply NA
.
Use negative lookahead assertion.
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N" "A" "A" "A" "N" "NA" "NA" "NA" "NA" "N" "A" "NA" "NA" "NA" "NA"
^(?![NA]$)
asserts that
-> after the start ^
there should be only one letter [NA]
either N
or A
which should be followed by line end $
.
.*
match all chars
So that above regex would match any string except the string is N
or A
If we are looking for fixed matches, then use %in%
with negation !
and assign it to 'NA'
ve[!ve %in% c("A", "N", "NA")] <- 'NA'
Note that in R
, missing value is unquoted NA
and not quoted. Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing
Here is an alternative regex solution, slightly simpler and much faster than Avinash's
ve[!grepl("^[N|A]$", ve)] <- NA_character_
You still probably should go with Akrun's solution which is "simple and straight-forward" and still faster.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.