用受控模式替换多个模式

Question

I have a text string which I would like to convert from 我有一个想要转换的文本字符串

text = "end back@drive@o correct back@drive@adjust@cats@do to tok"

to 至

"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

Rather, in general, I want to replace 相反，我通常要替换

"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"

and so on. 等等。 My attempt below uses the stringr package. 我在下面的尝试中使用了stringr包。

patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
  pretuples = y[-length(y)]  
  posttuples = y[-1]
  paste(paste0(pretuples, "@", posttuples), collapse = " ")
})  
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)

I don't think that str_replace_all is the function I'm looking for at the end, and of course it (reasonably) returns 我不认为str_replace_all是我最后要寻找的函数，当然（合理地）返回

[1] "end back@drive drive@o correct back@drive@adjust to tok" 
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

Could anyone help me sort this out? 谁能帮我解决这个问题？

Thanks very much. 非常感谢。

EDIT: The responses so far have been incredibly helpful, but it's a large file I'm parsing and don't really know how many times this a@b@c@d... pattern will be chained. 编辑：到目前为止，响应一直非常有用，但是我正在解析一个很大的文件，并不真正知道此a@b@c@d...模式将被链接多少次。 Is there a more general solution that doesn't rely on hard-coding in the length of the pattern (as I've tried above)? 是否有一种更通用的解决方案，该解决方案在模式的长度上不依赖于硬编码（如上所述）？

Answer 1

> gsub(x = text, pattern = '@(.*?)@', replacement = '@\\1 \\1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

You need to give more examples about the sort of cases you expect to encounter but the solution will lie in the same direction as above. 您需要提供更多有关预期会遇到的情况的示例，但是解决方案将与上述相同。

In response to the comment - you probably need to run a chain of gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@') on your text till it doesn't changes. 作为回应，您可能需要运行gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@')链gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@') ，直到它保持不变。 Again, without more test cases one can't be sure. 同样，如果没有更多的测试用例，就无法确定。

Answer 2

I w'd have use gsub : 我会用gsub ：

> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\\1@\\2 \\2@\\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

Answer 3

Try 尝试

pat <- "(\\s|\\b)[^@]+\\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\\2 \\2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

For the 'str1' 对于“ str1”

gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c"                     "a@b b@c c@d"                
#[3] "a@b b@c c@d d@e e@f f@g g@h"

data 数据

text  <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")

用受控模式替换多个模式

问题描述

3 个解决方案

解决方案1
3 2015-07-01 13:12:25

解决方案2
2 2015-07-01 13:15:32

解决方案3
1 已采纳 2015-07-01 14:38:21

data 数据

用受控模式替换多个模式

问题描述

3 个解决方案

解决方案1 3 2015-07-01 13:12:25

解决方案2 2 2015-07-01 13:15:32

解决方案3 1 已采纳 2015-07-01 14:38:21

data 数据

解决方案1
3 2015-07-01 13:12:25

解决方案2
2 2015-07-01 13:15:32

解决方案3
1 已采纳 2015-07-01 14:38:21