[英]Use regex in R to delete all characters in a string except specified words
Using the R programming language, I want to be able to use the gsub function to remove all characters except two or three specified words. 使用R编程语言,我希望能够使用gsub函数删除除两个或三个指定单词以外的所有字符。
I've tried a number of methods using look-behind, \\\\bMyWord\\\\b, and the caret symbol ^. 我已经尝试了多种使用后向搜索,\\\\ bMyWord \\\\ b和插入符号^的方法。
gsub("fbnmobile.*", "" , "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o ")
desired output: 所需的输出:
"fbnmobile gotvnspectran fbn" “ fbnmobile gotvnspectran fbn”
I want a template such that I can add or drop whole words that are to be excepted whenever I delete all of the other characters. 我想要一个模板,以便在删除所有其他字符时可以添加或删除整个单词。 In this case, I would specify to delete all characters except for words "fbnmobile", "gotvnspectran", and "fbn".
在这种情况下,我将指定删除单词“ fbnmobile”,“ gotvnspectran”和“ fbn”以外的所有字符。
Also, I'll gladly accept a recommendation for a definitive guide on regular expressions for R. 另外,我很乐意接受有关R的正则表达式的权威指南的建议。
It may be easier to extract. 提取起来可能更容易。 Specify the pattern of words to extract with OR (
|
) in str_extract_all
from stringr
and then paste
the extracted words to a single string 在
str_extract_all
中str_extract_all
要从stringr
中用OR( |
)提取的单词stringr
,然后将提取的单词paste
到单个字符串中
library(stringr)
paste(str_extract_all(str1, "\\b(fbnmobile|gotvnspectran|fbn)\\b")[[1]], collapse=" ")
#[1] "fbnmobile gotvnspectran fbn"
Or using gsub
或使用
gsub
gsub("\\s{2,}", " ", trimws(gsub("\\b(fbnmobile|gotvnspectran|fbn)\\b(*SKIP)(*F)|\\w+", "", str1, perl = TRUE)))
#[1] "fbnmobile gotvnspectran fbn"
str1 <- "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o "
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.