简体   繁体   中英

Use regex in R to delete all characters in a string except specified words

Using the R programming language, I want to be able to use the gsub function to remove all characters except two or three specified words.

I've tried a number of methods using look-behind, \\\\bMyWord\\\\b, and the caret symbol ^.

gsub("fbnmobile.*", "" , "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o ")

desired output:

"fbnmobile gotvnspectran fbn"

I want a template such that I can add or drop whole words that are to be excepted whenever I delete all of the other characters. In this case, I would specify to delete all characters except for words "fbnmobile", "gotvnspectran", and "fbn".

Also, I'll gladly accept a recommendation for a definitive guide on regular expressions for R.

It may be easier to extract. Specify the pattern of words to extract with OR ( | ) in str_extract_all from stringr and then paste the extracted words to a single string

library(stringr)
paste(str_extract_all(str1, "\\b(fbnmobile|gotvnspectran|fbn)\\b")[[1]], collapse=" ")
#[1] "fbnmobile gotvnspectran fbn"

Or using gsub

gsub("\\s{2,}", " ", trimws(gsub("\\b(fbnmobile|gotvnspectran|fbn)\\b(*SKIP)(*F)|\\w+", "", str1, perl = TRUE)))
#[1] "fbnmobile gotvnspectran fbn"

daa

str1 <- "fbnmobile akinremi temitope akinfemi gotvnspectran fbn akinremi temitope a and akinsanya arinola o "

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM