简体   繁体   English

否定gsub |替换除了某个向量中的字符串之外的所

[英]Negation of gsub | Replace everything except strings in a certain vector

I have a vector of strings: 我有一个字符串向量:

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")

I want to keep only three possible values in this vector: N , A , and NA . 我想在此向量中仅保留三个可能的值: NANA

Therefore, I want to replace any element that is NOT N or A with NA . 因此,我想用NA替换任何非NA元素。

How can I achieve this? 我怎样才能做到这一点?

I have tried the following: 我尝试过以下方法:

gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')

But these don't work well, because they replace every instance of "A" or "N" in every string with NA. 但这些效果不好,因为它们用NA替换每个字符串中的每个“A”或“N”实例。 So in some cases I end up with NANANANANANA , instead of simply NA . 因此,在某些情况下,我最终会使用NANANANANANA ,而不仅仅是NA

Use negative lookahead assertion. 使用否定先行断言。

ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N"  "A"  "A"  "A"  "N"  "NA" "NA" "NA" "NA" "N"  "A"  "NA" "NA" "NA" "NA"

^(?![NA]$) asserts that ^(?![NA]$)断言

-> after the start ^ there should be only one letter [NA] either N or A which should be followed by line end $ . - >开始后^应该只有一个字母[NA] NA ,后面应该是行尾$

.* match all chars .*匹配所有字符

So that above regex would match any string except the string is N or A 因此,除了字符串是NA之外,上面的正则表达式将匹配任何字符串

If we are looking for fixed matches, then use %in% with negation ! 如果我们正在寻找固定的匹配,那么使用%in% with negation ! and assign it to 'NA' 并将其分配给'NA'

ve[!ve %in% c("A", "N", "NA")] <- 'NA'

Note that in R , missing value is unquoted NA and not quoted. 请注意,在R ,缺失值是未引用的NA而不是引用。 Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing 希望它是一个不同的类别,并建议将类别名称更改为不同的名称,以避免将来解析时的混淆

Here is an alternative regex solution, slightly simpler and much faster than Avinash's 这是一个替代的正则表达式解决方案,比Avinash更简单,更快

ve[!grepl("^[N|A]$", ve)] <- NA_character_

You still probably should go with Akrun's solution which is "simple and straight-forward" and still faster. 你仍然应该选择Akrun的解决方案,这个解决方案“简单直接”,而且速度更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 gsub() 替换除某些子字符串之后的所有数字 - Using gsub() to replace all numbers except after certain substrings Windows 10上的R 3.4.1 gsub-查找并替换所有字符串 - R 3.4.1 gsub on Windows 10 - find and replace all strings except for 除了 R 中的特定字符串之外的所有内容的 grep 或 gsub - grep or gsub for everything except a specific string in R Gsub 命令用逗号和空格 (", ") 替换所有空格,除了某些单词之后用 R - Gsub command to replace all spaces with a comma and space, (", "), except after certain words with R 搜索和替换字符串列表-gsub是否有效? - Search-and-replace on a list of strings - gsub eapply? R如何用数据表中的另一个值向量替换/gsub一个值向量 - R how to replace/gsub a vector of values by another vector of values in a datatable 删除(或否定选择)由 dplyr 中的字符串向量表示的多个变量 - Drop (or negation selection) multiple variables represented by a vector of strings in dplyr 仅使用gsub替换除表达式之外的所有字符 - Replace all characters except expression using gsub only Select 除了带有加号正则表达式的字符串之外的所有内容 - Select everything except strings with a plus sign regex 使用 gsub 循环以通过存储在向量中的不同列替换字符 - Loop with gsub to replace a character through different columns stored in a vector
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM