简体   繁体   中英

Replacing multiple patterns with manipulated pattern

I have a text string which I would like to convert from

text = "end back@drive@o correct back@drive@adjust@cats@do to tok"

to

"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

Rather, in general, I want to replace

"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"

and so on. My attempt below uses the stringr package.

patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
  pretuples = y[-length(y)]  
  posttuples = y[-1]
  paste(paste0(pretuples, "@", posttuples), collapse = " ")
})  
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)

I don't think that str_replace_all is the function I'm looking for at the end, and of course it (reasonably) returns

[1] "end back@drive drive@o correct back@drive@adjust to tok" 
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

Could anyone help me sort this out?

Thanks very much.

EDIT: The responses so far have been incredibly helpful, but it's a large file I'm parsing and don't really know how many times this a@b@c@d... pattern will be chained. Is there a more general solution that doesn't rely on hard-coding in the length of the pattern (as I've tried above)?

> gsub(x = text, pattern = '@(.*?)@', replacement = '@\\1 \\1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

You need to give more examples about the sort of cases you expect to encounter but the solution will lie in the same direction as above.

In response to the comment - you probably need to run a chain of gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@') on your text till it doesn't changes. Again, without more test cases one can't be sure.

I w'd have use gsub :

> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\\1@\\2 \\2@\\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

Try

pat <- "(\\s|\\b)[^@]+\\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\\2 \\2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

For the 'str1'

gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c"                     "a@b b@c c@d"                
#[3] "a@b b@c c@d d@e e@f f@g g@h"

data

text  <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM