简体   繁体   English

在R中使用正则表达式删除字符串中除指定单词以外的所有字符

[英]Use regex in R to delete all characters in a string except specified words

Using the R programming language, I want to be able to use the gsub function to remove all characters except two or three specified words. 使用R编程语言,我希望能够使用gsub函数删除除两个或三个指定单词以外的所有字符。

I've tried a number of methods using look-behind, \\\\bMyWord\\\\b, and the caret symbol ^. 我已经尝试了多种使用后向搜索,\\\\ bMyWord \\\\ b和插入符号^的方法。

gsub("fbnmobile.*", "" , "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o ")

desired output: 所需的输出:

"fbnmobile gotvnspectran fbn" “ fbnmobile gotvnspectran fbn”

I want a template such that I can add or drop whole words that are to be excepted whenever I delete all of the other characters. 我想要一个模板,以便在删除所有其他字符时可以添加或删除整个单词。 In this case, I would specify to delete all characters except for words "fbnmobile", "gotvnspectran", and "fbn". 在这种情况下,我将指定删除单词“ fbnmobile”,“ gotvnspectran”和“ fbn”以外的所有字符。

Also, I'll gladly accept a recommendation for a definitive guide on regular expressions for R. 另外,我很乐意接受有关R的正则表达式的权威指南的建议。

It may be easier to extract. 提取起来可能更容易。 Specify the pattern of words to extract with OR ( | ) in str_extract_all from stringr and then paste the extracted words to a single string str_extract_allstr_extract_all要从stringr中用OR( | )提取的单词stringr ,然后将提取的单词paste到单个字符串中

library(stringr)
paste(str_extract_all(str1, "\\b(fbnmobile|gotvnspectran|fbn)\\b")[[1]], collapse=" ")
#[1] "fbnmobile gotvnspectran fbn"

Or using gsub 或使用gsub

gsub("\\s{2,}", " ", trimws(gsub("\\b(fbnmobile|gotvnspectran|fbn)\\b(*SKIP)(*F)|\\w+", "", str1, perl = TRUE)))
#[1] "fbnmobile gotvnspectran fbn"

daa DAA

str1 <- "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o "

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM