简体   繁体   English

用受控模式替换多个模式

[英]Replacing multiple patterns with manipulated pattern

I have a text string which I would like to convert from 我有一个想要转换的文本字符串

text = "end back@drive@o correct back@drive@adjust@cats@do to tok"

to

"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

Rather, in general, I want to replace 相反,我通常要替换

"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"

and so on. 等等。 My attempt below uses the stringr package. 我在下面的尝试中使用了stringr包。

patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
  pretuples = y[-length(y)]  
  posttuples = y[-1]
  paste(paste0(pretuples, "@", posttuples), collapse = " ")
})  
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)

I don't think that str_replace_all is the function I'm looking for at the end, and of course it (reasonably) returns 我不认为str_replace_all是我最后要寻找的函数,当然(合理地)返回

[1] "end back@drive drive@o correct back@drive@adjust to tok" 
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

Could anyone help me sort this out? 谁能帮我解决这个问题?

Thanks very much. 非常感谢。

EDIT: The responses so far have been incredibly helpful, but it's a large file I'm parsing and don't really know how many times this a@b@c@d... pattern will be chained. 编辑:到目前为止,响应一直非常有用,但是我正在解析一个很大的文件,并不真正知道此a@b@c@d...模式将被链接多少次。 Is there a more general solution that doesn't rely on hard-coding in the length of the pattern (as I've tried above)? 是否有一种更通用的解决方案,该解决方案在模式的长度上不依赖于硬编码(如上所述)?

> gsub(x = text, pattern = '@(.*?)@', replacement = '@\\1 \\1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

You need to give more examples about the sort of cases you expect to encounter but the solution will lie in the same direction as above. 您需要提供更多有关预期会遇到的情况的示例,但是解决方案将与上述相同。

In response to the comment - you probably need to run a chain of gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@') on your text till it doesn't changes. 作为回应,您可能需要运行gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@')gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\\\1 \\\\1@') ,直到它保持不变。 Again, without more test cases one can't be sure. 同样,如果没有更多的测试用例,就无法确定。

I w'd have use gsub : 我会用gsub

> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\\1@\\2 \\2@\\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

Try 尝试

pat <- "(\\s|\\b)[^@]+\\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\\2 \\2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

For the 'str1' 对于“ str1”

gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c"                     "a@b b@c c@d"                
#[3] "a@b b@c c@d d@e e@f f@g g@h"

data 数据

text  <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM