简体   繁体   中英

Negation of gsub | Replace everything except strings in a certain vector

I have a vector of strings:

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")

I want to keep only three possible values in this vector: N , A , and NA .

Therefore, I want to replace any element that is NOT N or A with NA .

How can I achieve this?

I have tried the following:

gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')

But these don't work well, because they replace every instance of "A" or "N" in every string with NA. So in some cases I end up with NANANANANANA , instead of simply NA .

Use negative lookahead assertion.

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N"  "A"  "A"  "A"  "N"  "NA" "NA" "NA" "NA" "N"  "A"  "NA" "NA" "NA" "NA"

^(?![NA]$) asserts that

-> after the start ^ there should be only one letter [NA] either N or A which should be followed by line end $ .

.* match all chars

So that above regex would match any string except the string is N or A

If we are looking for fixed matches, then use %in% with negation ! and assign it to 'NA'

ve[!ve %in% c("A", "N", "NA")] <- 'NA'

Note that in R , missing value is unquoted NA and not quoted. Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing

Here is an alternative regex solution, slightly simpler and much faster than Avinash's

ve[!grepl("^[N|A]$", ve)] <- NA_character_

You still probably should go with Akrun's solution which is "simple and straight-forward" and still faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM