繁体   English   中英

如何用 R 中的交错字符替换另一个字符串

[英]How to replace a string with another with interleaving characters in R

我有以下字符串:

x  <- "??????????DRHRTRHLAK??????????"
x2 <- "????????????????????TRCYHIDPHH"
x3 <- "FKDHKHIDVK????????????????????TRCYHIDPHH"
x4 <- "FKDHKHIDVK????????????????????"

我想要做的是替换所有的? 带有另一个字符串的字符

rep <- "ndqeegillkkkkfpssyvv"

导致:

ndqeegillkDRHRTRHLAKkkkfpssyvv           # x
ndqeegillkkkkfpssyvvTRCYHIDPHH           # x2
FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH # x3
FKDHKHIDVKndqeegillkkkkfpssyvv           # x4

基本上,用x中的交错字符DRHRTRHLAK保持替换中的rep顺序。

rep的总长度与? , 20 个字符。

请注意,我不想再次手动拆分rep作为一个额外的步骤。

我试过这个但失败了:

>gsub(pattern = "\\?+", replacement = rep, x = x)
[1] "ndqeegillkkkkfpssyvvDRHRTRHLAKndqeegillkkkkfpssyvv"

示例数据:

x <- c(
    "??????????DRHRTRHLAK??????????",
    "????????????????????TRCYHIDPHH",
    "FKDHKHIDVK????????????????????TRCYHIDPHH"
)
rep <- "ndqeegillkkkkfpssyvv"

regmatches<-以矢量化方式替换:

gr <- gregexpr("\\?+", x)
csml <- lapply(gr, \(x) cumsum(attr(x, "match.length")) )
regmatches(x, gr) <- lapply(csml, \(x) substring(rep, c(1,x[-length(x)]+1), x))
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"          
#[2] "ndqeegillkkkkfpssyvvTRCYHIDPHH"          
#[3] "FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH"

使用substr()拆分字符串:

x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
x<-gsub(pattern = "^\\?+", replacement = substr(rep, 1, 10), x = x)
x<-gsub(pattern = "\\?+$", replacement = substr(rep, 11, 20), x = x)
x
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"

正则表达式^匹配开始, $匹配结束。

您可以数一数?的数量,然后据此削减rep

x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"

pattern <- "(\\?+)(DRHRTRHLAK)(\\?+)"
n <- nchar(gsub(pattern, "\\1", x))

gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n+1, nchar(rep))), x)
#[1] "ndqeegillk??????????kkkfpssyvv"

编辑:新例子:

一个非常冗长的方法是做一个 if else 链,检查在哪里,并相应地替换rep

if(grepl("^\\?.+\\?$", x)){ #?'s on both ends
  n <- gsub(pattern, "\\1", x) %>% nchar()
  gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n+1, nchar(rep))), x)
} else if(grepl("^\\?", x)){ #?'s only on start
  n <- gsub(pattern, "\\1", x) %>% nchar()
  gsub(pattern, paste0(substr(rep, 1, n), "\\2"), x)
} else if(grepl("\\?$", x)){ #?'s only on end
  n <- gsub(pattern, "\\2", x) %>% nchar()
  gsub(pattern, paste0("\\2", substr(rep, 1, n)), x)
} else if(grepl("^[A-Z]+\\?+[A-Z]+$", x)){ #?'s only on middle
  n <- gsub(pattern, "\\2", x) %>% nchar()
  gsub("([A-Z]+)\\?+([A-Z]+)", paste0("\\1", substr(rep, 1, n), "\\2"), x)
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM