Using the R programming language, I want to be able to use the gsub function to remove all characters except two or three specified words.
I've tried a number of methods using look-behind, \\\\bMyWord\\\\b, and the caret symbol ^.
gsub("fbnmobile.*", "" , "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o ")
desired output:
"fbnmobile gotvnspectran fbn"
I want a template such that I can add or drop whole words that are to be excepted whenever I delete all of the other characters. In this case, I would specify to delete all characters except for words "fbnmobile", "gotvnspectran", and "fbn".
Also, I'll gladly accept a recommendation for a definitive guide on regular expressions for R.
It may be easier to extract. Specify the pattern of words to extract with OR ( |
) in str_extract_all
from stringr
and then paste
the extracted words to a single string
library(stringr)
paste(str_extract_all(str1, "\\b(fbnmobile|gotvnspectran|fbn)\\b")[[1]], collapse=" ")
#[1] "fbnmobile gotvnspectran fbn"
Or using gsub
gsub("\\s{2,}", " ", trimws(gsub("\\b(fbnmobile|gotvnspectran|fbn)\\b(*SKIP)(*F)|\\w+", "", str1, perl = TRUE)))
#[1] "fbnmobile gotvnspectran fbn"
str1 <- "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o "
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.