[英]How to replace a string with another with interleaving characters in R
我有以下字符串:
x <- "??????????DRHRTRHLAK??????????"
x2 <- "????????????????????TRCYHIDPHH"
x3 <- "FKDHKHIDVK????????????????????TRCYHIDPHH"
x4 <- "FKDHKHIDVK????????????????????"
我想要做的是替换所有的?
带有另一个字符串的字符
rep <- "ndqeegillkkkkfpssyvv"
导致:
ndqeegillkDRHRTRHLAKkkkfpssyvv # x
ndqeegillkkkkfpssyvvTRCYHIDPHH # x2
FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH # x3
FKDHKHIDVKndqeegillkkkkfpssyvv # x4
基本上,用x
中的交错字符DRHRTRHLAK
保持替换中的rep
顺序。
rep
的总长度与?
, 20 个字符。
请注意,我不想再次手动拆分rep
作为一个额外的步骤。
我试过这个但失败了:
>gsub(pattern = "\\?+", replacement = rep, x = x)
[1] "ndqeegillkkkkfpssyvvDRHRTRHLAKndqeegillkkkkfpssyvv"
示例数据:
x <- c(
"??????????DRHRTRHLAK??????????",
"????????????????????TRCYHIDPHH",
"FKDHKHIDVK????????????????????TRCYHIDPHH"
)
rep <- "ndqeegillkkkkfpssyvv"
用regmatches<-
以矢量化方式替换:
gr <- gregexpr("\\?+", x)
csml <- lapply(gr, \(x) cumsum(attr(x, "match.length")) )
regmatches(x, gr) <- lapply(csml, \(x) substring(rep, c(1,x[-length(x)]+1), x))
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"
#[2] "ndqeegillkkkkfpssyvvTRCYHIDPHH"
#[3] "FKDHKHIDVKndqeegillkkkkfpssyvvTRCYHIDPHH"
使用substr()
拆分字符串:
x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
x<-gsub(pattern = "^\\?+", replacement = substr(rep, 1, 10), x = x)
x<-gsub(pattern = "\\?+$", replacement = substr(rep, 11, 20), x = x)
x
#[1] "ndqeegillkDRHRTRHLAKkkkfpssyvv"
正则表达式^
匹配开始, $
匹配结束。
您可以数一数?的数量,然后据此削减rep
:
x <- "??????????DRHRTRHLAK??????????"
rep <- "ndqeegillkkkkfpssyvv"
pattern <- "(\\?+)(DRHRTRHLAK)(\\?+)"
n <- nchar(gsub(pattern, "\\1", x))
gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n+1, nchar(rep))), x)
#[1] "ndqeegillk??????????kkkfpssyvv"
一个非常冗长的方法是做一个 if else 链,检查在哪里,并相应地替换rep
。
if(grepl("^\\?.+\\?$", x)){ #?'s on both ends
n <- gsub(pattern, "\\1", x) %>% nchar()
gsub(pattern, paste0(substr(rep, 1, n), "\\2", substr(rep, n+1, nchar(rep))), x)
} else if(grepl("^\\?", x)){ #?'s only on start
n <- gsub(pattern, "\\1", x) %>% nchar()
gsub(pattern, paste0(substr(rep, 1, n), "\\2"), x)
} else if(grepl("\\?$", x)){ #?'s only on end
n <- gsub(pattern, "\\2", x) %>% nchar()
gsub(pattern, paste0("\\2", substr(rep, 1, n)), x)
} else if(grepl("^[A-Z]+\\?+[A-Z]+$", x)){ #?'s only on middle
n <- gsub(pattern, "\\2", x) %>% nchar()
gsub("([A-Z]+)\\?+([A-Z]+)", paste0("\\1", substr(rep, 1, n), "\\2"), x)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.