I have a text string which I would like to convert from
text = "end back@drive@o correct back@drive@adjust@cats@do to tok"
to
"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
Rather, in general, I want to replace
"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"
and so on. My attempt below uses the stringr
package.
patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
pretuples = y[-length(y)]
posttuples = y[-1]
paste(paste0(pretuples, "@", posttuples), collapse = " ")
})
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)
I don't think that str_replace_all
is the function I'm looking for at the end, and of course it (reasonably) returns
[1] "end back@drive drive@o correct back@drive@adjust to tok"
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
Could anyone help me sort this out?
Thanks very much.
EDIT: The responses so far have been incredibly helpful, but it's a large file I'm parsing and don't really know how many times this a@b@c@d...
pattern will be chained. Is there a more general solution that doesn't rely on hard-coding in the length of the pattern (as I've tried above)?
> gsub(x = text, pattern = '@(.*?)@', replacement = '@\\1 \\1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"
You need to give more examples about the sort of cases you expect to encounter but the solution will lie in the same direction as above.
In response to the comment - you probably need to run a chain of gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@')
on your text till it doesn't changes. Again, without more test cases one can't be sure.
I w'd have use gsub
:
> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\\1@\\2 \\2@\\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"
Try
pat <- "(\\s|\\b)[^@]+\\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\\2 \\2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
For the 'str1'
gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c" "a@b b@c c@d"
#[3] "a@b b@c c@d d@e e@f f@g g@h"
text <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.