在 gsub r 中使用正则表达式模式

Question

Hello I hve a df such as您好，我有一个 df，例如

COL1         
BLOC1.1_3_10-355(+)Sp_3
BLOC2.1_10-355(-)SSp_4
BLOC3.1_10-355(+)SP_32
BLOC1_3_10-355(+)SP4_2

How can I find a regex that can replace the _ here > _[Number]-[Number]( by如何找到可以替换_ here > _[Number]-[Number](的正则表达式

:[Number]-[Number](

Here I should get在这里我应该得到

COL1         
BLOC1.1_3:10-355(+)Sp_3
BLOC2.1:10-355(-)SSp_4
BLOC3.1:10-355(+)SP_32
BLOC1_3:10-355(+)SP4_2

I tried: gsub("_[0-9]-[0-9](",":[0-9]-[0-9](",df$COL1)我试过： gsub("_[0-9]-[0-9](",":[0-9]-[0-9](",df$COL1)

Answer 1

COL1 <- c("BLOC1.1_3_10-355(+)Sp_3",
"BLOC2.1_10-355(-)SSp_4",
"BLOC3.1_10-355(+)SP_32",
"BLOC1_3_10-355(+)SP4_2")

gsub( "(.*[0-9]+)(_)([0-9]+-.*)", "\\1:\\3", COL1)

[1] "BLOC1.1_3:10-355(+)Sp_3" "BLOC2.1:10-355(-)SSp_4"  "BLOC3.1:10-355(+)SP_32" 
[4] "BLOC1_3:10-355(+)SP4_2"

Answer 2

You can use您可以使用

_([0-9]+-[0-9]+\()

And replace with : and capture group 1.并替换为:并捕获组 1。

Regex demo正则表达式演示

COL1 <- c("BLOC1.1_3_10-355(+)Sp_3", "BLOC2.1_10-355(-)SSp_4", "BLOC3.1_10-355(+)SP_32", "BLOC1_3_10-355(+)SP4_2")
gsub("_([0-9]+-[0-9]+\\()", ":\\1", COL1)

Output Output

[1] "BLOC1.1_3:10-355(+)Sp_3" "BLOC2.1:10-355(-)SSp_4" 
[3] "BLOC3.1:10-355(+)SP_32"  "BLOC1_3:10-355(+)SP4_2"

Answer 3

A solution using string splitting:使用字符串拆分的解决方案：

output <- sapply(COL1, function(x) {
    parts <- strsplit(x, "_(?=\\d+-)", perl=TRUE)
    paste(parts[[1]][1], parts[[1]][2], sep=":")
})
names(output) <- c(1:4)
output

                        1                         2                         3 
"BLOC1.1_3:10-355(+)Sp_3"  "BLOC2.1:10-355(-)SSp_4"  "BLOC3.1:10-355(+)SP_32" 
                        4 
 "BLOC1_3:10-355(+)SP4_2"

Data:数据：

COL1 <- c("BLOC1.1_3_10-355(+)Sp_3", "BLOC2.1_10-355(-)SSp_4",
          "BLOC3.1_10-355(+)SP_32",  "BLOC1_3_10-355(+)SP4_2")

在 gsub r 中使用正则表达式模式

问题描述

3 个解决方案

解决方案1
4 已采纳 2021-02-09 11:00:29

解决方案2
4 2021-02-09 11:02:19

解决方案3
3 2021-02-09 11:05:54

在 gsub r 中使用正则表达式模式

问题描述

3 个解决方案

解决方案1 4 已采纳 2021-02-09 11:00:29

解决方案2 4 2021-02-09 11:02:19

解决方案3 3 2021-02-09 11:05:54

解决方案1
4 已采纳 2021-02-09 11:00:29

解决方案2
4 2021-02-09 11:02:19

解决方案3
3 2021-02-09 11:05:54