简体   繁体   English

用模式替换R中的字符串并替换两个向量

[英]Replace string in R with patterns and replacements both vectors

Let's say I have two vectors like so: 假设我有两个这样的向量:

a <- c("this", "is", "test")
b <- c("that", "was", "boy")

I also have a string variable like so: 我也有一个像这样的字符串变量:

string <- "this is a story about a test"

I want to replace values in string so that it becomes the following: 我想替换字符串中的值,以便它成为以下:

string <- "that was a story about a boy"

I could do this using a for loop but I want this to be vectorized. 我可以使用for循环来做这个,但我希望这是矢量化的。 How should I do this? 我该怎么做?

If you're open to using a non-base package, stringi will work really well here: 如果你愿意使用非基础包, stringi在这里工作得很好:

stringi::stri_replace_all_fixed(string, a, b, vectorize_all = FALSE)
#[1] "that was a story about a boy"

Note that this also works the same way for input strings of length > 1. 请注意,对于长度> 1的输入字符串,这也是一样的。

To be on the safe side, you can adapt this - similar to RUser's answer - to check for word boundaries before replacing: 为了安全起见,您可以对此进行调整 - 类似于RUser的答案 - 在更换之前检查字边界:

stri_replace_all_regex(string, paste0("\\b", a, "\\b"), b, vectorize_all = FALSE)

This would ensure that you don't accidentally replace his with hwas , for example. 例如,这将确保您不会意外地将his替换为hwas

Here are some solutions. 这是一些解决方案。 They each will work even if string is a character vector of strings in which case substitutions will be done on each component of it. 即使stringstring的字符向量,它们也将工作,在这种情况下,将对其中的每个组件进行替换。

1) Reduce This uses no packages. 1)减少这不使用包。

Reduce(function(x, i) gsub(paste0("\\b", a[i], "\\b"), b[i], x), seq_along(a), string)
## [1] "that was a story about a boy"

2) gsubfn gsubfn is like gsub but the replacement argument can be a list of substitutions (or certain other objects). 2)gsubfn gsubfngsub类似,但替换参数可以是替换列表(或某些其他对象)。

library(gsubfn)

gsubfn("\\w+", setNames(as.list(b), a), string)
## [1] "that was a story about a boy"

3) loop This isn't vectorized but have added for comparison. 3)循环这不是矢量化的,但已添加用于比较。 No packages are used. 没有使用包裹。

out <- string
for(i in seq_along(a)) out <- gsub(paste0("\\b", a[i], "\\b"), b[i], out)
out
## [1] "that was a story about a boy"

Note: There is some question of whether cycles are possible. 注意:有一些问题是循环是否可行。 For example, if 例如,如果

a <- c("a", "A")
b <- rev(a)

do we want 我们想要吗?

  • "a" to be replaced with "A" and then back to "a" again, or “a”用“A”代替,然后再回到“a”,或者
  • "a" and "A" to be swapped. 要交换的“a”和“A”。

All the solutions shown above assume the first case. 上面显示的所有解决方案都假设第一种情况。 If we wanted the second case then perform the operation twice. 如果我们想要第二种情况,则执行两次操作。 We will illustrate with (2) because it is the shortest but the same idea applies to them all: 我们将用(2)说明,因为它是最短的,但同样的想法适用于它们:

# swap "a" and "A"
a <- c("a", "A")
b <- rev(a)

tmp <- gsubfn("\\w+", setNames(as.list(seq_along(a)), a), string)
gsubfn("\\w+", setNames(as.list(b), seq_along(a)), tmp)
## [1] "this is A story about A test"
> library(stringi)
> stri_replace_all_regex(string, "\\b" %s+% a %s+% "\\b", b, vectorize_all=FALSE)
#[1] "that was a story about a boy"

Chipping in as well with a little function that relies only on R base : 切入以及仅依赖于R base的小功能:

repWords <- function(string,toRep,Rep,sep='\\s'){

  wrds <- unlist(strsplit(string,sep))
  ix <- match(toRep,wrds)
  wrds[ix] <- Rep  
  return(paste0(wrds,collapse = ' '))

}

a <- c("this", "is", "test")
b <- c("that", "was", "boy")

string <- "this is a story about a test"

> repWords(string,a,b)
[1] "that was a story about a boy"

Note: 注意:

This assumes you have a matching number of replacements. 这假设您有匹配的替换次数。 You can define the separator with sep . 您可以使用sep定义分隔符。

Talking of external packages, here's another one: 谈到外部包装,这是另一个:

a <- c("this", "is", "test")
b <- c("that", "was", "boy")
x <- "this is a story about a test"


library(qdap)
mgsub(a,b,x)

which gives: 这使:

 "that was a story about a boy"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM